Chapter 11: Multi-Provider LLM Integration #
"The mark of a good architecture is that the pieces can be swapped without rewriting the whole."
Why This Matters #
Choosing an LLM provider is not just a technical decision -- it is a business decision. The provider you pick determines your operating costs, your response latency, your data privacy posture, and your vendor lock-in risk. A startup prototyping on a laptop has different needs than an enterprise deploying to thousands of users behind a VPC. A chatbot that handles medical records has different constraints than one that generates marketing copy. Neam's multi-provider architecture lets you make these decisions per agent, per task, and change them without rewriting your program. Understanding how to configure and switch between providers is one of the most practical skills you will use in production Neam development.
In Chapter 10, you learned to declare agents using Ollama and OpenAI. But production agent systems rarely rely on a single provider. You might use a local model for development, GPT-4o for complex reasoning, Claude for long-context tasks, and Gemini for cost-sensitive workloads -- all within the same program.
Neam supports seven LLM providers out of the box. In this chapter, you will learn how to configure each one, understand their trade-offs, implement streaming responses, work with multimodal (vision) inputs, connect to custom OpenAI-compatible endpoints, and develop a strategy for selecting the right provider for each task.
The Provider Landscape #
Neam's agent system is designed around a provider abstraction. Every agent declares a
provider field, and the Neam VM handles the rest -- endpoint resolution, authentication,
request formatting, and response parsing. From your code's perspective, switching providers
is a one-line change.
Provider Quick Reference #
| Provider | provider Value |
Auth Method | Default Env Variable | Default Endpoint |
|---|---|---|---|---|
| Ollama | "ollama" |
None (local) | -- | http://localhost:11434 |
| OpenAI | "openai" |
API key | OPENAI_API_KEY |
https://api.openai.com/v1/chat/completions |
| Anthropic | "anthropic" |
API key | ANTHROPIC_API_KEY |
https://api.anthropic.com/v1/messages |
| Gemini | "gemini" |
API key | GEMINI_API_KEY |
https://generativelanguage.googleapis.com/... |
| Azure OpenAI | "azure_openai" |
API key | AZURE_OPENAI_API_KEY |
Custom Azure endpoint |
| AWS Bedrock | "bedrock" |
SigV4 | AWS credentials | Regional Bedrock endpoint |
| Vertex AI | "openai" |
ADC | GCP credentials | Custom Vertex endpoint |
The first six are native providers with dedicated implementations in the Neam VM. Vertex
AI is accessed through the OpenAI-compatible adapter pattern, where you set the
provider to "openai" and override the endpoint and api_key_env fields. Azure
OpenAI can also use the adapter pattern with provider: "openai" and a custom endpoint.
AWS Bedrock now has a native "bedrock" provider in addition to the adapter approach.
Ollama: Local, Private, Free #
Ollama runs models entirely on your machine. No data leaves your network. No API key is required. No costs per request.
Basic Configuration #
agent LocalAssistant {
provider: "ollama"
model: "llama3.2:3b"
temperature: 0.7
system: "You are a helpful assistant."
}
Custom Endpoint #
If Ollama is running on a different machine or port:
agent RemoteOllama {
provider: "ollama"
model: "llama3.2:3b"
endpoint: "http://192.168.1.100:11434"
system: "You are a helpful assistant."
}
Recommended Ollama Models #
| Model | Parameters | RAM Required | Best For |
|---|---|---|---|
qwen3:1.7b |
1.7B | ~2 GB | Lightweight, very fast |
qwen2.5:1.5b |
1.5B | ~2 GB | Fast prototyping, low-resource machines |
llama3.2:3b |
3B | ~4 GB | Development, testing, balanced quality |
qwen2.5:7b |
7B | ~6 GB | Good quality, moderate resources |
llama3:8b |
8B | ~6 GB | Strong general-purpose, popular |
qwen2.5:14b |
14B | ~10 GB | Higher quality, near-cloud performance |
llama3.1:70b |
70B | ~48 GB | Maximum local quality (requires high-end GPU) |
nomic-embed-text |
-- | ~1 GB | Embeddings for RAG (Chapter 15) |
Complete Ollama Example #
agent Assistant {
provider: "ollama"
model: "llama3.2:3b"
temperature: 0.7
system: "You are a helpful assistant. Be concise and friendly."
}
{
emit "=== Ollama Full Demo ===";
emit "";
emit "Testing local Ollama with llama3.2:3b...";
emit "";
let response = Assistant.ask("Hello! Can you tell me a fun fact about programming?");
emit "Assistant: " + response;
emit "";
let response2 = Assistant.ask("What is the difference between a compiler and an interpreter?");
emit "Assistant: " + response2;
emit "";
emit "=== Demo Complete ===";
}
To check which models you have installed, run ollama list in your terminal.
OpenAI: GPT-4o and GPT-4o-mini #
OpenAI's models are among the most capable available. GPT-4o is the flagship model with strong reasoning, coding, and instruction-following abilities. GPT-4o-mini is a smaller, faster, cheaper alternative that handles most tasks well.
Setup #
export OPENAI_API_KEY="sk-your-key-here"
GPT-4o-mini (Cost-Effective) #
agent EfficientBot {
provider: "openai"
model: "gpt-4o-mini"
temperature: 0.5
system: "You are a concise technical assistant. Answer in 1-2 sentences."
}
{
let response = EfficientBot.ask("What is a hash map?");
emit response;
}
GPT-4o (Maximum Capability) #
agent PowerBot {
provider: "openai"
model: "gpt-4o"
temperature: 0.3
system: "You are an expert software architect. Provide detailed,
well-reasoned answers with examples."
}
{
let response = PowerBot.ask("Explain the trade-offs between microservices and monoliths.");
emit response;
}
Available OpenAI Models #
| Model | Context Window | Strengths |
|---|---|---|
gpt-4o |
128K tokens | Best overall capability, reasoning, coding |
gpt-4o-mini |
128K tokens | Fast, cheap, good for most tasks |
o1-preview |
128K tokens | Advanced reasoning (chain-of-thought built in) |
o1-mini |
128K tokens | Reasoning-focused, cost-effective |
Anthropic: Claude #
Anthropic's Claude models are known for their strong instruction-following, safety alignment, and excellent performance on long-context tasks.
Setup #
export ANTHROPIC_API_KEY="sk-ant-your-key-here"
Configuration #
agent ClaudeAgent {
provider: "anthropic"
model: "claude-sonnet-4-20250514"
api_key_env: "ANTHROPIC_API_KEY"
temperature: 0.5
system: "You are a thoughtful, precise assistant. Provide well-structured answers."
}
{
let response = ClaudeAgent.ask("Explain the CAP theorem with a concrete example.");
emit response;
}
Available Anthropic Models #
| Model | Context Window | Strengths |
|---|---|---|
claude-sonnet-4-20250514 |
200K tokens | Best balance of capability and cost |
claude-opus-4-20250514 |
200K tokens | Maximum capability |
claude-haiku-3-20250514 |
200K tokens | Fast, cheapest option |
Anthropic uses the api_key_env field explicitly because the default
environment variable name differs from OpenAI's convention.
Google Gemini #
Google's Gemini models offer competitive performance, large context windows, and cost-effective pricing.
Setup #
export GEMINI_API_KEY="your-gemini-api-key"
Configuration #
agent GeminiAgent {
provider: "gemini"
model: "gemini-2.0-flash"
api_key_env: "GEMINI_API_KEY"
temperature: 0.6
system: "You are a knowledgeable assistant powered by Google Gemini."
}
{
let response = GeminiAgent.ask("What are the key features of the Transformer architecture?");
emit response;
}
Available Gemini Models #
| Model | Context Window | Strengths |
|---|---|---|
gemini-2.0-flash |
1M tokens | Very fast, massive context, low cost |
gemini-2.0-pro |
1M tokens | Higher capability, still cost-effective |
gemini-1.5-pro |
2M tokens | Largest context window available |
Gemini's 1M+ token context windows make it ideal for processing very long documents without chunking.
Azure OpenAI #
If your organization uses Azure OpenAI Service, Neam provides direct support through the
"azure_openai" provider, or you can use the OpenAI-compatible adapter pattern.
Setup #
export AZURE_OPENAI_API_KEY="your-azure-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"
Using the Direct Provider #
agent AzureAgent {
provider: "azure_openai"
model: "gpt-4o"
endpoint: env("AZURE_OPENAI_ENDPOINT")
api_key_env: "AZURE_OPENAI_API_KEY"
temperature: 0.5
system: "You are an enterprise assistant deployed on Azure."
}
{
let response = AzureAgent.ask("Summarize the benefits of cloud computing.");
emit response;
}
Using the OpenAI Adapter #
Alternatively, you can use provider: "openai" with a custom endpoint:
agent AzureViaAdapter {
provider: "openai"
model: "gpt-4o"
endpoint: "https://your-resource.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-08-01-preview"
api_key_env: "AZURE_OPENAI_API_KEY"
temperature: 0.5
system: "You are an enterprise assistant deployed on Azure."
}
Key points:
- The env() function reads an environment variable at runtime, keeping endpoints
configurable across environments.
- The endpoint includes your Azure resource name, deployment name, and API version.
- The api_key_env points to your Azure-specific API key.
AWS Bedrock Adapter #
AWS Bedrock provides access to multiple foundation models through AWS infrastructure. Authentication uses AWS SigV4 signing.
Setup #
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="us-east-1"
Configuration #
agent BedrockAgent {
provider: "openai"
model: "anthropic.claude-3-sonnet-20240229-v1:0"
endpoint: "https://bedrock-runtime.us-east-1.amazonaws.com/model/anthropic.claude-3-sonnet-20240229-v1:0/invoke"
api_key_env: "AWS_BEDROCK_TOKEN"
temperature: 0.5
system: "You are an assistant running on AWS Bedrock."
}
AWS Bedrock authentication uses SigV4 request signing. You may need to use
a signing proxy or configure the AWS_BEDROCK_TOKEN with a pre-signed session token.
Native Bedrock Provider #
AWS Bedrock has a dedicated native provider that handles SigV4 signing automatically.
Instead of using the OpenAI adapter pattern with a manual endpoint, you can use
provider: "bedrock" directly:
agent NativeBedrock {
provider: "bedrock"
model: "anthropic.claude-3-5-sonnet-20241022-v2:0"
system: "You are an enterprise assistant on AWS Bedrock."
}
The native Bedrock provider reads AWS credentials from the environment automatically:
AWS_ACCESS_KEY_ID-- Your AWS access key.AWS_SECRET_ACCESS_KEY-- Your AWS secret key.AWS_DEFAULT_REGION-- The AWS region where your Bedrock models are enabled (e.g.,us-east-1).
No endpoint or api_key_env fields are needed. The Neam VM resolves the correct
regional Bedrock endpoint and signs requests using SigV4 behind the scenes.
// Using multiple Bedrock models in one program
agent BedrockClaude {
provider: "bedrock"
model: "anthropic.claude-3-5-sonnet-20241022-v2:0"
temperature: 0.3
system: "You are a precise analyst."
}
agent BedrockTitan {
provider: "bedrock"
model: "amazon.titan-text-express-v1"
temperature: 0.5
system: "You are a helpful assistant."
}
{
let question = "What are the benefits of serverless architecture?";
let r1 = BedrockClaude.ask(question);
emit "Claude on Bedrock: " + r1;
emit "";
let r2 = BedrockTitan.ask(question);
emit "Titan on Bedrock: " + r2;
}
The native "bedrock" provider is the recommended approach for new projects.
The adapter pattern (using provider: "openai" with a custom endpoint) still works and
may be useful if you need to route through a custom proxy.
Google Vertex AI Adapter #
Vertex AI provides access to Gemini models through Google Cloud with enterprise features like VPC Service Controls, customer-managed encryption keys, and regional data residency.
Setup #
# Authenticate with Application Default Credentials
gcloud auth application-default login
export VERTEX_API_KEY="your-vertex-token"
Configuration #
agent VertexAgent {
provider: "openai"
model: "gemini-2.0-flash"
endpoint: "https://us-central1-aiplatform.googleapis.com/v1/projects/your-project/locations/us-central1/publishers/google/models/gemini-2.0-flash:generateContent"
api_key_env: "VERTEX_API_KEY"
temperature: 0.5
system: "You are an enterprise assistant running on Vertex AI."
}
Custom and OpenAI-Compatible Endpoints #
Many LLM serving frameworks expose an OpenAI-compatible API. This means you can connect
Neam to virtually any model server -- including vLLM, text-generation-inference (TGI),
LiteLLM, LocalAI, and self-hosted inference endpoints -- by using provider: "openai"
with a custom endpoint.
Basic Configuration #
agent CustomAgent {
provider: "openai"
model: "my-custom-model"
endpoint: "https://my-llm-server.example.com/v1/chat/completions"
api_key_env: "MY_API_KEY"
system: "You are a helpful assistant."
}
The key fields:
provider: "openai"-- Tells Neam to use the OpenAI request/response format.endpoint-- The full URL of your custom server's chat completions endpoint.model-- The model name your server expects. This varies by server.api_key_env-- The environment variable holding your API key (if required).
Common OpenAI-Compatible Servers #
| Server | Typical Endpoint | Notes |
|---|---|---|
| vLLM | http://localhost:8000/v1/chat/completions |
High-throughput GPU serving |
| text-generation-inference | http://localhost:8080/v1/chat/completions |
HuggingFace's inference server |
| LiteLLM | http://localhost:4000/v1/chat/completions |
Proxy that unifies 100+ providers |
| LocalAI | http://localhost:8080/v1/chat/completions |
CPU-friendly local inference |
Example: Connecting to vLLM #
agent VllmAgent {
provider: "openai"
model: "meta-llama/Llama-3-8B-Instruct"
endpoint: "http://localhost:8000/v1/chat/completions"
temperature: 0.7
system: "You are a helpful assistant served by vLLM."
}
{
let response = VllmAgent.ask("Explain gradient descent in simple terms.");
emit response;
}
Example: Connecting to LiteLLM Proxy #
LiteLLM acts as a unified proxy, letting you switch between providers by changing only the model name:
agent LiteLLMAgent {
provider: "openai"
model: "gpt-4o"
endpoint: "http://localhost:4000/v1/chat/completions"
api_key_env: "LITELLM_API_KEY"
system: "You are a helpful assistant via LiteLLM proxy."
}
{
let response = LiteLLMAgent.ask("What are the SOLID principles?");
emit response;
}
When using custom endpoints, features like streaming, function calling, and vision depend on what the server supports. Not all OpenAI-compatible servers implement the full API surface.
Try It Yourself: Connect to a Custom Endpoint
If you have Docker installed, spin up a quick LiteLLM proxy or vLLM server and connect
a Neam agent to it. Try changing the model field while keeping the same endpoint, and
observe how the proxy routes to different backends. This is a great way to experiment
with models you cannot run locally.
Multi-Provider Programs #
One of Neam's most powerful features is the ability to use multiple providers in a single program. This enables patterns like:
- Development/Production split: Use Ollama locally, OpenAI in production.
- Cost optimization: Route simple tasks to cheap models, complex tasks to powerful ones.
- Redundancy: Fall back to a different provider if one is unavailable.
- Best-of-breed: Use each provider for what it does best.
// Demonstrates using four providers in one program
agent OpenAIAgent {
provider: "openai"
model: "gpt-4o"
system: "You are powered by OpenAI."
}
agent AnthropicAgent {
provider: "anthropic"
model: "claude-sonnet-4-20250514"
api_key_env: "ANTHROPIC_API_KEY"
system: "You are powered by Anthropic."
}
agent GeminiAgent {
provider: "gemini"
model: "gemini-2.0-flash"
api_key_env: "GEMINI_API_KEY"
system: "You are powered by Google Gemini."
}
agent LocalAgent {
provider: "ollama"
model: "llama3.2:3b"
endpoint: "http://localhost:11434"
system: "You run locally via Ollama."
}
{
let question = "In one sentence, what makes a good programming language?";
emit "=== Multi-Provider Comparison ===";
emit "Question: " + question;
emit "";
let r1 = OpenAIAgent.ask(question);
emit "OpenAI (gpt-4o): " + r1;
emit "";
let r2 = AnthropicAgent.ask(question);
emit "Anthropic (Claude): " + r2;
emit "";
let r3 = GeminiAgent.ask(question);
emit "Gemini (2.0 Flash): " + r3;
emit "";
let r4 = LocalAgent.ask(question);
emit "Ollama (llama3.2:3b): " + r4;
}
Multi-Provider Face-Off
Copy the program above and modify it to test a question relevant to your domain -- for
example, "Explain the difference between REST and GraphQL" or "Summarize the key ideas
of functional programming." Compare the responses for quality, length, and tone. Try
adding a clock() call before and after each .ask() to measure latency. Which
provider gives the best answer for your use case? Which is fastest?
Streaming Responses #
For long responses, waiting for the entire output to generate before displaying anything creates a poor user experience. Streaming delivers tokens as they are generated, so the user sees output progressively.
In Neam, streaming is supported by providers that implement the chat_stream interface.
The .ask() method handles both batch and streaming transparently -- the VM will stream
if the provider supports it and accumulate the result into the returned string:
agent StreamBot {
provider: "openai"
model: "gpt-4o-mini"
system: "You are a helpful assistant. Tell engaging stories."
}
{
// The VM streams internally and returns the complete response
let response = StreamBot.ask("Tell me a short story about a robot learning to cook.");
emit response;
}
At the language level, .ask() always returns a complete string. Streaming
is an optimization handled by the VM layer to reduce time-to-first-token latency.
A streaming callback API for real-time token processing may be exposed in the future.
Token Tracking and Cost Monitoring #
When working with cloud providers, keeping track of token usage and costs is critical. Neam automatically tracks this information for every agent call. When tracing is enabled, each call generates a trace entry with full details:
Agent: SmartAssistant
Model: gpt-4o-mini
Prompt tokens: 120
Completion tokens: 85
Total tokens: 205
Estimated cost: $0.00041
Latency: 842ms
You can enable tracing in your neam.toml:
[agent]
tracing = true
trace_dir = ".neam/traces"
Or enable it per runner:
runner MyPipeline {
entry_agent: TriageAgent
tracing: enabled
}
When tracing is enabled, the VM writes structured trace logs for every LLM call and tool invocation. These traces are useful for:
- Cost auditing -- See exactly how much each agent costs per query.
- Performance tuning -- Identify slow calls and optimize prompts to use fewer tokens.
- Debugging -- Trace the full conversation flow through multi-agent systems.
- Compliance -- Maintain an audit trail of all LLM interactions.
During development, enable tracing to understand how your agents behave. In production, use it for cost monitoring and debugging.
Provider Feature Comparison #
Not all providers support the same features. Here is a quick reference:
| Feature | OpenAI | Ollama | Anthropic | Gemini | Bedrock | Azure OpenAI | Vertex AI |
|---|---|---|---|---|---|---|---|
| Streaming | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Native tool calling | Yes | Yes* | Yes | Yes | Yes | Yes | Yes |
| Vision (images) | Yes | Limited** | Yes | Yes | Yes | Yes | Yes |
| Token counting | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Custom endpoint | Yes | Yes | Yes | Yes | N/A | Yes | Yes |
| Local/private | No | Yes | No | No | No | No | No |
| Voice (STT/TTS) | Yes | No | No | Yes | No | No | No |
Native tool calling is supported across all providers. When you declare skill
parameters with types (e.g., params: { city: string, units: string }), the Neam VM
automatically generates a JSON Schema definition and sends it to the provider using its
native function/tool calling protocol. This eliminates the need for manual schema
authoring and ensures consistent behavior across providers.
* Ollama function calling depends on the specific model. Not all local models support it.
** Ollama vision support requires vision-capable models like llava or bakllava.
Vision and Multimodal Input #
Several providers support vision -- the ability to analyze images alongside text prompts.
Neam provides the ask_with_image() method for this purpose.
Image from URL #
agent VisionBot {
provider: "openai"
model: "gpt-4o"
system: "You can see and analyze images. Describe what you observe."
}
{
let description = VisionBot.ask_with_image(
"What is in this image?",
"https://example.com/photo.jpg"
);
emit description;
}
Vision Provider Support #
| Provider | Vision Models | Image Input |
|---|---|---|
| OpenAI | gpt-4o, gpt-4o-mini |
URL, base64 |
| Anthropic | claude-sonnet-4-* |
URL, base64 |
| Gemini | gemini-2.0-flash, gemini-2.0-pro |
URL, base64 |
| Ollama | llava, bakllava |
URL, base64 |
Practical Vision Example #
agent ImageAnalyzer {
provider: "openai"
model: "gpt-4o"
temperature: 0.3
system: "You are an image analysis expert. Describe images in detail,
identifying objects, text, colors, and spatial relationships.
Be precise and structured in your descriptions."
}
agent CaptionWriter {
provider: "openai"
model: "gpt-4o-mini"
temperature: 0.7
system: "You write engaging social media captions. Given an image
description, create a catchy caption under 280 characters."
}
{
// Step 1: Analyze the image
let analysis = ImageAnalyzer.ask_with_image(
"Describe this image in detail.",
"https://example.com/sunset.jpg"
);
emit "Analysis: " + analysis;
emit "";
// Step 2: Generate a caption from the analysis
let caption = CaptionWriter.ask("Write a social media caption for this image: " + analysis);
emit "Caption: " + caption;
}
Provider Selection Strategy #
Choosing the right provider for each agent is a design decision that affects cost, latency, quality, and privacy. Here is a decision framework:
Cost Comparison #
| Provider/Model | Input Cost (per 1M tokens) | Output Cost (per 1M tokens) | Relative Cost |
|---|---|---|---|
| Ollama (any, incl. qwen3:1.7b) | Free | Free | None |
| Gemini 2.0 Flash | ~$0.10 | ~$0.40 | Very Low |
| GPT-4o-mini | ~$0.15 | ~$0.60 | Low |
| Claude Haiku | ~$0.25 | ~$1.25 | Low |
| Claude Sonnet | ~$3.00 | ~$15.00 | Medium |
| GPT-4o | ~$2.50 | ~$10.00 | Medium |
| Bedrock (Claude Sonnet) | ~$3.00 | ~$15.00 | Medium* |
| Claude Opus | ~$15.00 | ~$75.00 | High |
* AWS Bedrock pricing is generally comparable to direct API pricing for the same models. Exact costs may vary by region and usage tier. Check the AWS Bedrock pricing page for current rates.
Decision Tree #
// This is a conceptual pattern, not runnable code.
// Use this thinking when designing your agent architecture.
// For development and testing:
// -> Use Ollama (free, private, no API key needed)
// For simple production tasks (classification, extraction, formatting):
// -> Use GPT-4o-mini or Gemini 2.0 Flash (low cost, fast)
// For complex reasoning and analysis:
// -> Use GPT-4o or Claude Sonnet (higher quality)
// For long-context processing (>100K tokens):
// -> Use Gemini (1M+ token context)
// For privacy-sensitive data:
// -> Use Ollama (data never leaves your network)
// For enterprise compliance:
// -> Use Azure OpenAI or Vertex AI (enterprise security)
// For AWS-native deployments:
// -> Use Bedrock (SigV4 auth, VPC endpoints, IAM integration)
Development vs. Production Provider Strategy
For development and prototyping, use Ollama with qwen3:1.7b. It is completely
free, requires no API key, runs on modest hardware (~2 GB RAM), and responds quickly.
This lets you iterate on prompts, test agent architectures, and debug multi-agent
flows without spending a cent or worrying about rate limits.
For production, choose the provider whose models best fit your requirements along
three axes: latency (how fast do you need responses?), quality (how complex is
the reasoning?), and cost (what is your budget per query?). There is no single
best provider -- a triage classifier might use gpt-4o-mini (fast, cheap), while a
legal document analyzer might need claude-sonnet-4 (high quality, long context).
Start cheap and upgrade only the agents that need it.
Practical Provider Selection Pattern #
// Different agents use different providers based on their needs
// Triage agent: fast, cheap -- just classifying input
agent Triage {
provider: "openai"
model: "gpt-4o-mini"
temperature: 0.1
system: "Classify the input as TECHNICAL, BILLING, or GENERAL. Reply with only the category."
}
// Technical agent: needs strong reasoning
agent TechSupport {
provider: "openai"
model: "gpt-4o"
temperature: 0.3
system: "You are a senior technical support engineer. Provide detailed solutions."
}
// Billing agent: straightforward, keep costs low
agent BillingSupport {
provider: "gemini"
model: "gemini-2.0-flash"
api_key_env: "GEMINI_API_KEY"
temperature: 0.3
system: "You are a billing specialist. Help with payment questions."
}
// Sensitive data processing: keep it local
agent DataProcessor {
provider: "ollama"
model: "qwen2.5:14b"
temperature: 0.1
system: "You process and classify sensitive customer data. Be precise."
}
fun route_query(query) {
let category = Triage.ask(query);
if (category.contains("TECHNICAL")) {
return TechSupport.ask(query);
}
if (category.contains("BILLING")) {
return BillingSupport.ask(query);
}
return "General: I can help with that! " + query;
}
{
let result = route_query("My API endpoint is returning 503 errors");
emit result;
}
Provider Failover Pattern #
For production resilience, you can implement a failover pattern that tries a primary provider and falls back to alternatives:
agent PrimaryBot {
provider: "openai"
model: "gpt-4o-mini"
temperature: 0.5
system: "You are a helpful assistant."
}
agent FallbackBot {
provider: "ollama"
model: "llama3.2:3b"
temperature: 0.5
system: "You are a helpful assistant."
}
fun ask_with_fallback(prompt) {
try {
let result = PrimaryBot.ask(prompt);
return result;
} catch (err) {
emit "[Warning] Primary provider failed, using fallback: " + err;
try {
let result = FallbackBot.ask(prompt);
return result;
} catch (err2) {
return "All providers unavailable: " + err2;
}
}
}
{
let response = ask_with_fallback("What is Neam?");
emit response;
}
Environment Variable Management #
Managing API keys across multiple providers requires discipline. Here are the environment variables each provider expects:
# OpenAI (default for provider: "openai")
export OPENAI_API_KEY="sk-..."
# Anthropic (default for provider: "anthropic")
export ANTHROPIC_API_KEY="sk-ant-..."
# Google Gemini (default for provider: "gemini")
export GEMINI_API_KEY="AI..."
# Azure OpenAI (custom, via api_key_env)
export AZURE_OPENAI_API_KEY="..."
# AWS Bedrock (standard AWS credentials)
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_DEFAULT_REGION="us-east-1"
Never hard-code API keys in your .neam source files. Always use
environment variables. If you accidentally commit a key to version control, rotate
it immediately.
Summary #
In this chapter, you learned:
- Neam supports seven LLM providers: Ollama, OpenAI, Anthropic, Gemini, Azure OpenAI, AWS Bedrock, and GCP Vertex AI -- plus any OpenAI-compatible endpoint.
- The first six have native provider implementations. AWS Bedrock has a dedicated
"bedrock"provider that handles SigV4 authentication automatically. Vertex AI and custom servers use the OpenAI-compatible adapter pattern with custom endpoints. - Native tool calling is supported across all providers with automatic JSON Schema generation from parameter declarations.
- New Ollama models like
qwen3:1.7b(lightweight, very fast),qwen2.5:7b(good quality), andllama3:8b(strong general-purpose) expand your local model options. - Custom/OpenAI-compatible endpoints let you connect to vLLM, text-generation-inference,
LiteLLM, and other servers using
provider: "openai"with a customendpoint. - Each provider has different strengths, costs, and capabilities. Provider selection should be driven by your specific requirements.
- Multi-provider programs are a first-class Neam pattern -- different agents can use different providers within the same program.
- Streaming is handled transparently by the VM for providers that support it.
- Token tracking and cost monitoring are built in -- enable tracing to see token usage, costs, and latency for every call.
- Provider features vary -- not all providers support vision, voice, or function calling equally. The feature comparison table helps you choose.
- Vision/multimodal input is available via
ask_with_image()across all native providers that support it. - The
env()function reads environment variables at runtime, useful for configurable endpoints. - Failover patterns improve production resilience by falling back to alternative providers.
- API keys should always be managed through environment variables, never hard-coded.
In the next chapter, we will give your agents the ability to take actions in the world through tools and function calling.
Exercises #
Exercise 11.1: Provider Comparison
Write a program that asks the same question to three different providers (e.g., Ollama,
OpenAI, and Gemini). Use clock() to measure the response time for each. Emit a
formatted comparison table showing the provider, response time, and response length.
Exercise 11.2: Cost Calculator
Write a function estimate_cost(provider, model, input_tokens, output_tokens) that
returns an estimated cost in USD based on the pricing table in this chapter. Test it with
several combinations and emit the results.
Exercise 11.3: Failover Chain Extend the failover pattern to try three providers in sequence: OpenAI, then Anthropic, then Ollama. If all three fail, return a helpful error message. Test this by using an intentionally wrong API key for the first two providers.
Exercise 11.4: Smart Router
Write a program with a Router agent (using gpt-4o-mini) that classifies incoming
queries as SIMPLE, COMPLEX, or CREATIVE. Route simple queries to gemini-2.0-flash,
complex queries to gpt-4o, and creative queries to an Ollama model with temperature:
1.2. Emit the classification and the routed response.
Exercise 11.5: Vision Pipeline
Write a program that uses ask_with_image() to analyze an image, then passes the
analysis to a second agent that generates a haiku based on the image description. Use
different providers for the two agents.
Exercise 11.6: Provider Configuration Matrix Create a Neam program that declares six agents -- one for each configurable field combination (different providers, temperatures, endpoints, and system prompts). Document in comments why you chose each configuration. Run them all against the prompt "Explain why the sky is blue" and compare the outputs.