Programming Neam
📖 12 min read

Chapter 11: Multi-Provider LLM Integration #

"The mark of a good architecture is that the pieces can be swapped without rewriting the whole."

Why This Matters #

Choosing an LLM provider is not just a technical decision -- it is a business decision. The provider you pick determines your operating costs, your response latency, your data privacy posture, and your vendor lock-in risk. A startup prototyping on a laptop has different needs than an enterprise deploying to thousands of users behind a VPC. A chatbot that handles medical records has different constraints than one that generates marketing copy. Neam's multi-provider architecture lets you make these decisions per agent, per task, and change them without rewriting your program. Understanding how to configure and switch between providers is one of the most practical skills you will use in production Neam development.

In Chapter 10, you learned to declare agents using Ollama and OpenAI. But production agent systems rarely rely on a single provider. You might use a local model for development, GPT-4o for complex reasoning, Claude for long-context tasks, and Gemini for cost-sensitive workloads -- all within the same program.

Neam supports seven LLM providers out of the box. In this chapter, you will learn how to configure each one, understand their trade-offs, implement streaming responses, work with multimodal (vision) inputs, connect to custom OpenAI-compatible endpoints, and develop a strategy for selecting the right provider for each task.


The Provider Landscape #

Neam's agent system is designed around a provider abstraction. Every agent declares a provider field, and the Neam VM handles the rest -- endpoint resolution, authentication, request formatting, and response parsing. From your code's perspective, switching providers is a one-line change.

Your Neam
Program
Neam VM
Provider
Abstraction
Ollama
(local)
OpenAI
(cloud)
Anthro-
pic
Gemini
(cloud)
Bedrock
(AWS)
llama3.2
qwen2.5
qwen3
gpt-4o
gpt-4o
mini
claude
sonnet
gemini
2.0
flash
claude
on
bedrock

Provider Quick Reference #

Provider provider Value Auth Method Default Env Variable Default Endpoint
Ollama "ollama" None (local) -- http://localhost:11434
OpenAI "openai" API key OPENAI_API_KEY https://api.openai.com/v1/chat/completions
Anthropic "anthropic" API key ANTHROPIC_API_KEY https://api.anthropic.com/v1/messages
Gemini "gemini" API key GEMINI_API_KEY https://generativelanguage.googleapis.com/...
Azure OpenAI "azure_openai" API key AZURE_OPENAI_API_KEY Custom Azure endpoint
AWS Bedrock "bedrock" SigV4 AWS credentials Regional Bedrock endpoint
Vertex AI "openai" ADC GCP credentials Custom Vertex endpoint

The first six are native providers with dedicated implementations in the Neam VM. Vertex AI is accessed through the OpenAI-compatible adapter pattern, where you set the provider to "openai" and override the endpoint and api_key_env fields. Azure OpenAI can also use the adapter pattern with provider: "openai" and a custom endpoint. AWS Bedrock now has a native "bedrock" provider in addition to the adapter approach.


Ollama: Local, Private, Free #

Ollama runs models entirely on your machine. No data leaves your network. No API key is required. No costs per request.

Basic Configuration #

neam
agent LocalAssistant {
  provider: "ollama"
  model: "llama3.2:3b"
  temperature: 0.7
  system: "You are a helpful assistant."
}

Custom Endpoint #

If Ollama is running on a different machine or port:

neam
agent RemoteOllama {
  provider: "ollama"
  model: "llama3.2:3b"
  endpoint: "http://192.168.1.100:11434"
  system: "You are a helpful assistant."
}
Model Parameters RAM Required Best For
qwen3:1.7b 1.7B ~2 GB Lightweight, very fast
qwen2.5:1.5b 1.5B ~2 GB Fast prototyping, low-resource machines
llama3.2:3b 3B ~4 GB Development, testing, balanced quality
qwen2.5:7b 7B ~6 GB Good quality, moderate resources
llama3:8b 8B ~6 GB Strong general-purpose, popular
qwen2.5:14b 14B ~10 GB Higher quality, near-cloud performance
llama3.1:70b 70B ~48 GB Maximum local quality (requires high-end GPU)
nomic-embed-text -- ~1 GB Embeddings for RAG (Chapter 15)

Complete Ollama Example #

neam
agent Assistant {
  provider: "ollama"
  model: "llama3.2:3b"
  temperature: 0.7
  system: "You are a helpful assistant. Be concise and friendly."
}

{
  emit "=== Ollama Full Demo ===";
  emit "";
  emit "Testing local Ollama with llama3.2:3b...";
  emit "";

  let response = Assistant.ask("Hello! Can you tell me a fun fact about programming?");
  emit "Assistant: " + response;
  emit "";

  let response2 = Assistant.ask("What is the difference between a compiler and an interpreter?");
  emit "Assistant: " + response2;
  emit "";

  emit "=== Demo Complete ===";
}
💡 Tip

To check which models you have installed, run ollama list in your terminal.


OpenAI: GPT-4o and GPT-4o-mini #

OpenAI's models are among the most capable available. GPT-4o is the flagship model with strong reasoning, coding, and instruction-following abilities. GPT-4o-mini is a smaller, faster, cheaper alternative that handles most tasks well.

Setup #

bash
export OPENAI_API_KEY="sk-your-key-here"

GPT-4o-mini (Cost-Effective) #

neam
agent EfficientBot {
  provider: "openai"
  model: "gpt-4o-mini"
  temperature: 0.5
  system: "You are a concise technical assistant. Answer in 1-2 sentences."
}

{
  let response = EfficientBot.ask("What is a hash map?");
  emit response;
}

GPT-4o (Maximum Capability) #

neam
agent PowerBot {
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.3
  system: "You are an expert software architect. Provide detailed,
           well-reasoned answers with examples."
}

{
  let response = PowerBot.ask("Explain the trade-offs between microservices and monoliths.");
  emit response;
}

Available OpenAI Models #

Model Context Window Strengths
gpt-4o 128K tokens Best overall capability, reasoning, coding
gpt-4o-mini 128K tokens Fast, cheap, good for most tasks
o1-preview 128K tokens Advanced reasoning (chain-of-thought built in)
o1-mini 128K tokens Reasoning-focused, cost-effective

Anthropic: Claude #

Anthropic's Claude models are known for their strong instruction-following, safety alignment, and excellent performance on long-context tasks.

Setup #

bash
export ANTHROPIC_API_KEY="sk-ant-your-key-here"

Configuration #

neam
agent ClaudeAgent {
  provider: "anthropic"
  model: "claude-sonnet-4-20250514"
  api_key_env: "ANTHROPIC_API_KEY"
  temperature: 0.5
  system: "You are a thoughtful, precise assistant. Provide well-structured answers."
}

{
  let response = ClaudeAgent.ask("Explain the CAP theorem with a concrete example.");
  emit response;
}

Available Anthropic Models #

Model Context Window Strengths
claude-sonnet-4-20250514 200K tokens Best balance of capability and cost
claude-opus-4-20250514 200K tokens Maximum capability
claude-haiku-3-20250514 200K tokens Fast, cheapest option
📝 Note

Anthropic uses the api_key_env field explicitly because the default environment variable name differs from OpenAI's convention.


Google Gemini #

Google's Gemini models offer competitive performance, large context windows, and cost-effective pricing.

Setup #

bash
export GEMINI_API_KEY="your-gemini-api-key"

Configuration #

neam
agent GeminiAgent {
  provider: "gemini"
  model: "gemini-2.0-flash"
  api_key_env: "GEMINI_API_KEY"
  temperature: 0.6
  system: "You are a knowledgeable assistant powered by Google Gemini."
}

{
  let response = GeminiAgent.ask("What are the key features of the Transformer architecture?");
  emit response;
}

Available Gemini Models #

Model Context Window Strengths
gemini-2.0-flash 1M tokens Very fast, massive context, low cost
gemini-2.0-pro 1M tokens Higher capability, still cost-effective
gemini-1.5-pro 2M tokens Largest context window available
💡 Tip

Gemini's 1M+ token context windows make it ideal for processing very long documents without chunking.


Azure OpenAI #

If your organization uses Azure OpenAI Service, Neam provides direct support through the "azure_openai" provider, or you can use the OpenAI-compatible adapter pattern.

Setup #

bash
export AZURE_OPENAI_API_KEY="your-azure-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"

Using the Direct Provider #

neam
agent AzureAgent {
  provider: "azure_openai"
  model: "gpt-4o"
  endpoint: env("AZURE_OPENAI_ENDPOINT")
  api_key_env: "AZURE_OPENAI_API_KEY"
  temperature: 0.5
  system: "You are an enterprise assistant deployed on Azure."
}

{
  let response = AzureAgent.ask("Summarize the benefits of cloud computing.");
  emit response;
}

Using the OpenAI Adapter #

Alternatively, you can use provider: "openai" with a custom endpoint:

neam
agent AzureViaAdapter {
  provider: "openai"
  model: "gpt-4o"
  endpoint: "https://your-resource.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-08-01-preview"
  api_key_env: "AZURE_OPENAI_API_KEY"
  temperature: 0.5
  system: "You are an enterprise assistant deployed on Azure."
}

Key points: - The env() function reads an environment variable at runtime, keeping endpoints configurable across environments. - The endpoint includes your Azure resource name, deployment name, and API version. - The api_key_env points to your Azure-specific API key.


AWS Bedrock Adapter #

AWS Bedrock provides access to multiple foundation models through AWS infrastructure. Authentication uses AWS SigV4 signing.

Setup #

bash
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="us-east-1"

Configuration #

neam
agent BedrockAgent {
  provider: "openai"
  model: "anthropic.claude-3-sonnet-20240229-v1:0"
  endpoint: "https://bedrock-runtime.us-east-1.amazonaws.com/model/anthropic.claude-3-sonnet-20240229-v1:0/invoke"
  api_key_env: "AWS_BEDROCK_TOKEN"
  temperature: 0.5
  system: "You are an assistant running on AWS Bedrock."
}
📝 Note

AWS Bedrock authentication uses SigV4 request signing. You may need to use a signing proxy or configure the AWS_BEDROCK_TOKEN with a pre-signed session token.

Native Bedrock Provider #

AWS Bedrock has a dedicated native provider that handles SigV4 signing automatically. Instead of using the OpenAI adapter pattern with a manual endpoint, you can use provider: "bedrock" directly:

neam
agent NativeBedrock {
  provider: "bedrock"
  model: "anthropic.claude-3-5-sonnet-20241022-v2:0"
  system: "You are an enterprise assistant on AWS Bedrock."
}

The native Bedrock provider reads AWS credentials from the environment automatically:

No endpoint or api_key_env fields are needed. The Neam VM resolves the correct regional Bedrock endpoint and signs requests using SigV4 behind the scenes.

neam
// Using multiple Bedrock models in one program

agent BedrockClaude {
  provider: "bedrock"
  model: "anthropic.claude-3-5-sonnet-20241022-v2:0"
  temperature: 0.3
  system: "You are a precise analyst."
}

agent BedrockTitan {
  provider: "bedrock"
  model: "amazon.titan-text-express-v1"
  temperature: 0.5
  system: "You are a helpful assistant."
}

{
  let question = "What are the benefits of serverless architecture?";
  let r1 = BedrockClaude.ask(question);
  emit "Claude on Bedrock: " + r1;
  emit "";
  let r2 = BedrockTitan.ask(question);
  emit "Titan on Bedrock: " + r2;
}
💡 Tip

The native "bedrock" provider is the recommended approach for new projects. The adapter pattern (using provider: "openai" with a custom endpoint) still works and may be useful if you need to route through a custom proxy.


Google Vertex AI Adapter #

Vertex AI provides access to Gemini models through Google Cloud with enterprise features like VPC Service Controls, customer-managed encryption keys, and regional data residency.

Setup #

bash
# Authenticate with Application Default Credentials
gcloud auth application-default login
export VERTEX_API_KEY="your-vertex-token"

Configuration #

neam
agent VertexAgent {
  provider: "openai"
  model: "gemini-2.0-flash"
  endpoint: "https://us-central1-aiplatform.googleapis.com/v1/projects/your-project/locations/us-central1/publishers/google/models/gemini-2.0-flash:generateContent"
  api_key_env: "VERTEX_API_KEY"
  temperature: 0.5
  system: "You are an enterprise assistant running on Vertex AI."
}

Custom and OpenAI-Compatible Endpoints #

Many LLM serving frameworks expose an OpenAI-compatible API. This means you can connect Neam to virtually any model server -- including vLLM, text-generation-inference (TGI), LiteLLM, LocalAI, and self-hosted inference endpoints -- by using provider: "openai" with a custom endpoint.

Basic Configuration #

neam
agent CustomAgent {
  provider: "openai"
  model: "my-custom-model"
  endpoint: "https://my-llm-server.example.com/v1/chat/completions"
  api_key_env: "MY_API_KEY"
  system: "You are a helpful assistant."
}

The key fields:

Common OpenAI-Compatible Servers #

Server Typical Endpoint Notes
vLLM http://localhost:8000/v1/chat/completions High-throughput GPU serving
text-generation-inference http://localhost:8080/v1/chat/completions HuggingFace's inference server
LiteLLM http://localhost:4000/v1/chat/completions Proxy that unifies 100+ providers
LocalAI http://localhost:8080/v1/chat/completions CPU-friendly local inference

Example: Connecting to vLLM #

neam
agent VllmAgent {
  provider: "openai"
  model: "meta-llama/Llama-3-8B-Instruct"
  endpoint: "http://localhost:8000/v1/chat/completions"
  temperature: 0.7
  system: "You are a helpful assistant served by vLLM."
}

{
  let response = VllmAgent.ask("Explain gradient descent in simple terms.");
  emit response;
}

Example: Connecting to LiteLLM Proxy #

LiteLLM acts as a unified proxy, letting you switch between providers by changing only the model name:

neam
agent LiteLLMAgent {
  provider: "openai"
  model: "gpt-4o"
  endpoint: "http://localhost:4000/v1/chat/completions"
  api_key_env: "LITELLM_API_KEY"
  system: "You are a helpful assistant via LiteLLM proxy."
}

{
  let response = LiteLLMAgent.ask("What are the SOLID principles?");
  emit response;
}
📝 Note

When using custom endpoints, features like streaming, function calling, and vision depend on what the server supports. Not all OpenAI-compatible servers implement the full API surface.


Try It Yourself: Connect to a Custom Endpoint

If you have Docker installed, spin up a quick LiteLLM proxy or vLLM server and connect a Neam agent to it. Try changing the model field while keeping the same endpoint, and observe how the proxy routes to different backends. This is a great way to experiment with models you cannot run locally.


Multi-Provider Programs #

One of Neam's most powerful features is the ability to use multiple providers in a single program. This enables patterns like:

neam
// Demonstrates using four providers in one program

agent OpenAIAgent {
  provider: "openai"
  model: "gpt-4o"
  system: "You are powered by OpenAI."
}

agent AnthropicAgent {
  provider: "anthropic"
  model: "claude-sonnet-4-20250514"
  api_key_env: "ANTHROPIC_API_KEY"
  system: "You are powered by Anthropic."
}

agent GeminiAgent {
  provider: "gemini"
  model: "gemini-2.0-flash"
  api_key_env: "GEMINI_API_KEY"
  system: "You are powered by Google Gemini."
}

agent LocalAgent {
  provider: "ollama"
  model: "llama3.2:3b"
  endpoint: "http://localhost:11434"
  system: "You run locally via Ollama."
}

{
  let question = "In one sentence, what makes a good programming language?";

  emit "=== Multi-Provider Comparison ===";
  emit "Question: " + question;
  emit "";

  let r1 = OpenAIAgent.ask(question);
  emit "OpenAI (gpt-4o): " + r1;
  emit "";

  let r2 = AnthropicAgent.ask(question);
  emit "Anthropic (Claude): " + r2;
  emit "";

  let r3 = GeminiAgent.ask(question);
  emit "Gemini (2.0 Flash): " + r3;
  emit "";

  let r4 = LocalAgent.ask(question);
  emit "Ollama (llama3.2:3b): " + r4;
}

🎯 Try It Yourself: Multi-Provider Face-Off

Multi-Provider Face-Off

Copy the program above and modify it to test a question relevant to your domain -- for example, "Explain the difference between REST and GraphQL" or "Summarize the key ideas of functional programming." Compare the responses for quality, length, and tone. Try adding a clock() call before and after each .ask() to measure latency. Which provider gives the best answer for your use case? Which is fastest?


Streaming Responses #

For long responses, waiting for the entire output to generate before displaying anything creates a poor user experience. Streaming delivers tokens as they are generated, so the user sees output progressively.

Batch (default .ask()):
User ---> Agent ---> LLM .............. Response complete
<entire response>
Time: ================================|
User waits entire time
Streaming (chat_stream):
User ---> Agent ---> LLM -> tok -> tok -> tok -> tok -> done
| | | | |
Time: ===|====|====|====|====|
User sees tokens as they arrive

In Neam, streaming is supported by providers that implement the chat_stream interface. The .ask() method handles both batch and streaming transparently -- the VM will stream if the provider supports it and accumulate the result into the returned string:

neam
agent StreamBot {
  provider: "openai"
  model: "gpt-4o-mini"
  system: "You are a helpful assistant. Tell engaging stories."
}

{
  // The VM streams internally and returns the complete response
  let response = StreamBot.ask("Tell me a short story about a robot learning to cook.");
  emit response;
}
📝 Note

At the language level, .ask() always returns a complete string. Streaming is an optimization handled by the VM layer to reduce time-to-first-token latency. A streaming callback API for real-time token processing may be exposed in the future.


Token Tracking and Cost Monitoring #

When working with cloud providers, keeping track of token usage and costs is critical. Neam automatically tracks this information for every agent call. When tracing is enabled, each call generates a trace entry with full details:

text
Agent: SmartAssistant
  Model: gpt-4o-mini
  Prompt tokens: 120
  Completion tokens: 85
  Total tokens: 205
  Estimated cost: $0.00041
  Latency: 842ms

You can enable tracing in your neam.toml:

toml
[agent]
tracing = true
trace_dir = ".neam/traces"

Or enable it per runner:

neam
runner MyPipeline {
  entry_agent: TriageAgent
  tracing: enabled
}

When tracing is enabled, the VM writes structured trace logs for every LLM call and tool invocation. These traces are useful for:

💡 Tip

During development, enable tracing to understand how your agents behave. In production, use it for cost monitoring and debugging.


Provider Feature Comparison #

Not all providers support the same features. Here is a quick reference:

Feature OpenAI Ollama Anthropic Gemini Bedrock Azure OpenAI Vertex AI
Streaming Yes Yes Yes Yes Yes Yes Yes
Native tool calling Yes Yes* Yes Yes Yes Yes Yes
Vision (images) Yes Limited** Yes Yes Yes Yes Yes
Token counting Yes Yes Yes Yes Yes Yes Yes
Custom endpoint Yes Yes Yes Yes N/A Yes Yes
Local/private No Yes No No No No No
Voice (STT/TTS) Yes No No Yes No No No

Native tool calling is supported across all providers. When you declare skill parameters with types (e.g., params: { city: string, units: string }), the Neam VM automatically generates a JSON Schema definition and sends it to the provider using its native function/tool calling protocol. This eliminates the need for manual schema authoring and ensures consistent behavior across providers.

* Ollama function calling depends on the specific model. Not all local models support it.

** Ollama vision support requires vision-capable models like llava or bakllava.


Vision and Multimodal Input #

Several providers support vision -- the ability to analyze images alongside text prompts. Neam provides the ask_with_image() method for this purpose.

Image from URL #

neam
agent VisionBot {
  provider: "openai"
  model: "gpt-4o"
  system: "You can see and analyze images. Describe what you observe."
}

{
  let description = VisionBot.ask_with_image(
    "What is in this image?",
    "https://example.com/photo.jpg"
  );
  emit description;
}

Vision Provider Support #

Provider Vision Models Image Input
OpenAI gpt-4o, gpt-4o-mini URL, base64
Anthropic claude-sonnet-4-* URL, base64
Gemini gemini-2.0-flash, gemini-2.0-pro URL, base64
Ollama llava, bakllava URL, base64

Practical Vision Example #

neam
agent ImageAnalyzer {
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.3
  system: "You are an image analysis expert. Describe images in detail,
           identifying objects, text, colors, and spatial relationships.
           Be precise and structured in your descriptions."
}

agent CaptionWriter {
  provider: "openai"
  model: "gpt-4o-mini"
  temperature: 0.7
  system: "You write engaging social media captions. Given an image
           description, create a catchy caption under 280 characters."
}

{
  // Step 1: Analyze the image
  let analysis = ImageAnalyzer.ask_with_image(
    "Describe this image in detail.",
    "https://example.com/sunset.jpg"
  );
  emit "Analysis: " + analysis;
  emit "";

  // Step 2: Generate a caption from the analysis
  let caption = CaptionWriter.ask("Write a social media caption for this image: " + analysis);
  emit "Caption: " + caption;
}

Provider Selection Strategy #

Choosing the right provider for each agent is a design decision that affects cost, latency, quality, and privacy. Here is a decision framework:

Cost Comparison #

Provider/Model Input Cost (per 1M tokens) Output Cost (per 1M tokens) Relative Cost
Ollama (any, incl. qwen3:1.7b) Free Free None
Gemini 2.0 Flash ~$0.10 ~$0.40 Very Low
GPT-4o-mini ~$0.15 ~$0.60 Low
Claude Haiku ~$0.25 ~$1.25 Low
Claude Sonnet ~$3.00 ~$15.00 Medium
GPT-4o ~$2.50 ~$10.00 Medium
Bedrock (Claude Sonnet) ~$3.00 ~$15.00 Medium*
Claude Opus ~$15.00 ~$75.00 High

* AWS Bedrock pricing is generally comparable to direct API pricing for the same models. Exact costs may vary by region and usage tier. Check the AWS Bedrock pricing page for current rates.

Decision Tree #

neam
// This is a conceptual pattern, not runnable code.
// Use this thinking when designing your agent architecture.

// For development and testing:
//   -> Use Ollama (free, private, no API key needed)

// For simple production tasks (classification, extraction, formatting):
//   -> Use GPT-4o-mini or Gemini 2.0 Flash (low cost, fast)

// For complex reasoning and analysis:
//   -> Use GPT-4o or Claude Sonnet (higher quality)

// For long-context processing (>100K tokens):
//   -> Use Gemini (1M+ token context)

// For privacy-sensitive data:
//   -> Use Ollama (data never leaves your network)

// For enterprise compliance:
//   -> Use Azure OpenAI or Vertex AI (enterprise security)

// For AWS-native deployments:
//   -> Use Bedrock (SigV4 auth, VPC endpoints, IAM integration)

💡 Pro Tip: Development vs. Production Provider Strategy

Development vs. Production Provider Strategy

For development and prototyping, use Ollama with qwen3:1.7b. It is completely free, requires no API key, runs on modest hardware (~2 GB RAM), and responds quickly. This lets you iterate on prompts, test agent architectures, and debug multi-agent flows without spending a cent or worrying about rate limits.

For production, choose the provider whose models best fit your requirements along three axes: latency (how fast do you need responses?), quality (how complex is the reasoning?), and cost (what is your budget per query?). There is no single best provider -- a triage classifier might use gpt-4o-mini (fast, cheap), while a legal document analyzer might need claude-sonnet-4 (high quality, long context). Start cheap and upgrade only the agents that need it.

Practical Provider Selection Pattern #

neam
// Different agents use different providers based on their needs

// Triage agent: fast, cheap -- just classifying input
agent Triage {
  provider: "openai"
  model: "gpt-4o-mini"
  temperature: 0.1
  system: "Classify the input as TECHNICAL, BILLING, or GENERAL. Reply with only the category."
}

// Technical agent: needs strong reasoning
agent TechSupport {
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.3
  system: "You are a senior technical support engineer. Provide detailed solutions."
}

// Billing agent: straightforward, keep costs low
agent BillingSupport {
  provider: "gemini"
  model: "gemini-2.0-flash"
  api_key_env: "GEMINI_API_KEY"
  temperature: 0.3
  system: "You are a billing specialist. Help with payment questions."
}

// Sensitive data processing: keep it local
agent DataProcessor {
  provider: "ollama"
  model: "qwen2.5:14b"
  temperature: 0.1
  system: "You process and classify sensitive customer data. Be precise."
}

fun route_query(query) {
  let category = Triage.ask(query);

  if (category.contains("TECHNICAL")) {
    return TechSupport.ask(query);
  }
  if (category.contains("BILLING")) {
    return BillingSupport.ask(query);
  }
  return "General: I can help with that! " + query;
}

{
  let result = route_query("My API endpoint is returning 503 errors");
  emit result;
}

Provider Failover Pattern #

For production resilience, you can implement a failover pattern that tries a primary provider and falls back to alternatives:

neam
agent PrimaryBot {
  provider: "openai"
  model: "gpt-4o-mini"
  temperature: 0.5
  system: "You are a helpful assistant."
}

agent FallbackBot {
  provider: "ollama"
  model: "llama3.2:3b"
  temperature: 0.5
  system: "You are a helpful assistant."
}

fun ask_with_fallback(prompt) {
  try {
    let result = PrimaryBot.ask(prompt);
    return result;
  } catch (err) {
    emit "[Warning] Primary provider failed, using fallback: " + err;
    try {
      let result = FallbackBot.ask(prompt);
      return result;
    } catch (err2) {
      return "All providers unavailable: " + err2;
    }
  }
}

{
  let response = ask_with_fallback("What is Neam?");
  emit response;
}

Environment Variable Management #

Managing API keys across multiple providers requires discipline. Here are the environment variables each provider expects:

bash
# OpenAI (default for provider: "openai")
export OPENAI_API_KEY="sk-..."

# Anthropic (default for provider: "anthropic")
export ANTHROPIC_API_KEY="sk-ant-..."

# Google Gemini (default for provider: "gemini")
export GEMINI_API_KEY="AI..."

# Azure OpenAI (custom, via api_key_env)
export AZURE_OPENAI_API_KEY="..."

# AWS Bedrock (standard AWS credentials)
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_DEFAULT_REGION="us-east-1"
⚠️ Warning

Never hard-code API keys in your .neam source files. Always use environment variables. If you accidentally commit a key to version control, rotate it immediately.


Summary #

In this chapter, you learned:

In the next chapter, we will give your agents the ability to take actions in the world through tools and function calling.


Exercises #

Exercise 11.1: Provider Comparison Write a program that asks the same question to three different providers (e.g., Ollama, OpenAI, and Gemini). Use clock() to measure the response time for each. Emit a formatted comparison table showing the provider, response time, and response length.

Exercise 11.2: Cost Calculator Write a function estimate_cost(provider, model, input_tokens, output_tokens) that returns an estimated cost in USD based on the pricing table in this chapter. Test it with several combinations and emit the results.

Exercise 11.3: Failover Chain Extend the failover pattern to try three providers in sequence: OpenAI, then Anthropic, then Ollama. If all three fail, return a helpful error message. Test this by using an intentionally wrong API key for the first two providers.

Exercise 11.4: Smart Router Write a program with a Router agent (using gpt-4o-mini) that classifies incoming queries as SIMPLE, COMPLEX, or CREATIVE. Route simple queries to gemini-2.0-flash, complex queries to gpt-4o, and creative queries to an Ollama model with temperature: 1.2. Emit the classification and the routed response.

Exercise 11.5: Vision Pipeline Write a program that uses ask_with_image() to analyze an image, then passes the analysis to a second agent that generates a haiku based on the image description. Use different providers for the two agents.

Exercise 11.6: Provider Configuration Matrix Create a Neam program that declares six agents -- one for each configurable field combination (different providers, temperatures, endpoints, and system prompts). Document in comments why you chose each configuration. Run them all against the prompt "Explain why the sky is blue" and compare the outputs.

Start typing to search...