📖 12 min read

Chapter 11: Multi-Provider LLM Integration #

"The mark of a good architecture is that the pieces can be swapped without rewriting the whole."

Why This Matters #

Choosing an LLM provider is not just a technical decision -- it is a business decision. The provider you pick determines your operating costs, your response latency, your data privacy posture, and your vendor lock-in risk. A startup prototyping on a laptop has different needs than an enterprise deploying to thousands of users behind a VPC. A chatbot that handles medical records has different constraints than one that generates marketing copy. Neam's multi-provider architecture lets you make these decisions per agent, per task, and change them without rewriting your program. Understanding how to configure and switch between providers is one of the most practical skills you will use in production Neam development.

In Chapter 10, you learned to declare agents using Ollama and OpenAI. But production agent systems rarely rely on a single provider. You might use a local model for development, GPT-4o for complex reasoning, Claude for long-context tasks, and Gemini for cost-sensitive workloads -- all within the same program.

Neam supports seven LLM providers out of the box. In this chapter, you will learn how to configure each one, understand their trade-offs, implement streaming responses, work with multimodal (vision) inputs, connect to custom OpenAI-compatible endpoints, and develop a strategy for selecting the right provider for each task.

The Provider Landscape #

Neam's agent system is designed around a provider abstraction. Every agent declares a provider field, and the Neam VM handles the rest -- endpoint resolution, authentication, request formatting, and response parsing. From your code's perspective, switching providers is a one-line change.

Your Neam

Program

▼

Neam VM

Provider

Abstraction

▼

Ollama

(local)

▶

OpenAI

(cloud)

▶

Anthro-

pic

▶

Gemini

(cloud)

▶

Bedrock

(AWS)

▼

llama3.2

qwen2.5

qwen3

▶

gpt-4o

mini

▶

claude

sonnet

▶

gemini

2.0

flash

▶

claude

bedrock

Provider Quick Reference #

Provider	`provider` Value	Auth Method	Default Env Variable	Default Endpoint
Ollama	`"ollama"`	None (local)	--	`http://localhost:11434`
OpenAI	`"openai"`	API key	`OPENAI_API_KEY`	`https://api.openai.com/v1/chat/completions`
Anthropic	`"anthropic"`	API key	`ANTHROPIC_API_KEY`	`https://api.anthropic.com/v1/messages`
Gemini	`"gemini"`	API key	`GEMINI_API_KEY`	`https://generativelanguage.googleapis.com/...`
Azure OpenAI	`"azure_openai"`	API key	`AZURE_OPENAI_API_KEY`	Custom Azure endpoint
AWS Bedrock	`"bedrock"`	SigV4	AWS credentials	Regional Bedrock endpoint
Vertex AI	`"openai"`	ADC	GCP credentials	Custom Vertex endpoint

The first six are native providers with dedicated implementations in the Neam VM. Vertex AI is accessed through the OpenAI-compatible adapter pattern, where you set the provider to "openai" and override the endpoint and api_key_env fields. Azure OpenAI can also use the adapter pattern with provider: "openai" and a custom endpoint. AWS Bedrock now has a native "bedrock" provider in addition to the adapter approach.

Ollama: Local, Private, Free #

Ollama runs models entirely on your machine. No data leaves your network. No API key is required. No costs per request.

Basic Configuration #

neam

agent LocalAssistant {
  provider: "ollama"
  model: "llama3.2:3b"
  temperature: 0.7
  system: "You are a helpful assistant."
}

Custom Endpoint #

If Ollama is running on a different machine or port:

neam

agent RemoteOllama {
  provider: "ollama"
  model: "llama3.2:3b"
  endpoint: "http://192.168.1.100:11434"
  system: "You are a helpful assistant."
}

Recommended Ollama Models #

Model	Parameters	RAM Required	Best For
`qwen3:1.7b`	1.7B	~2 GB	Lightweight, very fast
`qwen2.5:1.5b`	1.5B	~2 GB	Fast prototyping, low-resource machines
`llama3.2:3b`	3B	~4 GB	Development, testing, balanced quality
`qwen2.5:7b`	7B	~6 GB	Good quality, moderate resources
`llama3:8b`	8B	~6 GB	Strong general-purpose, popular
`qwen2.5:14b`	14B	~10 GB	Higher quality, near-cloud performance
`llama3.1:70b`	70B	~48 GB	Maximum local quality (requires high-end GPU)
`nomic-embed-text`	--	~1 GB	Embeddings for RAG (Chapter 15)

Complete Ollama Example #

neam

agent Assistant {
  provider: "ollama"
  model: "llama3.2:3b"
  temperature: 0.7
  system: "You are a helpful assistant. Be concise and friendly."
}

{
  emit "=== Ollama Full Demo ===";
  emit "";
  emit "Testing local Ollama with llama3.2:3b...";
  emit "";

  let response = Assistant.ask("Hello! Can you tell me a fun fact about programming?");
  emit "Assistant: " + response;
  emit "";

  let response2 = Assistant.ask("What is the difference between a compiler and an interpreter?");
  emit "Assistant: " + response2;
  emit "";

  emit "=== Demo Complete ===";
}

💡 Tip

To check which models you have installed, run ollama list in your terminal.

OpenAI: GPT-4o and GPT-4o-mini #

OpenAI's models are among the most capable available. GPT-4o is the flagship model with strong reasoning, coding, and instruction-following abilities. GPT-4o-mini is a smaller, faster, cheaper alternative that handles most tasks well.

Setup #

bash

export OPENAI_API_KEY="sk-your-key-here"

GPT-4o-mini (Cost-Effective) #

neam

agent EfficientBot {
  provider: "openai"
  model: "gpt-4o-mini"
  temperature: 0.5
  system: "You are a concise technical assistant. Answer in 1-2 sentences."
}

{
  let response = EfficientBot.ask("What is a hash map?");
  emit response;
}

GPT-4o (Maximum Capability) #

neam

agent PowerBot {
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.3
  system: "You are an expert software architect. Provide detailed,
           well-reasoned answers with examples."
}

{
  let response = PowerBot.ask("Explain the trade-offs between microservices and monoliths.");
  emit response;
}

Available OpenAI Models #

Model	Context Window	Strengths
`gpt-4o`	128K tokens	Best overall capability, reasoning, coding
`gpt-4o-mini`	128K tokens	Fast, cheap, good for most tasks
`o1-preview`	128K tokens	Advanced reasoning (chain-of-thought built in)
`o1-mini`	128K tokens	Reasoning-focused, cost-effective

Anthropic: Claude #

Anthropic's Claude models are known for their strong instruction-following, safety alignment, and excellent performance on long-context tasks.

Setup #

bash

export ANTHROPIC_API_KEY="sk-ant-your-key-here"

Configuration #

neam

agent ClaudeAgent {
  provider: "anthropic"
  model: "claude-sonnet-4-20250514"
  api_key_env: "ANTHROPIC_API_KEY"
  temperature: 0.5
  system: "You are a thoughtful, precise assistant. Provide well-structured answers."
}

{
  let response = ClaudeAgent.ask("Explain the CAP theorem with a concrete example.");
  emit response;
}

Available Anthropic Models #

Model	Context Window	Strengths
`claude-sonnet-4-20250514`	200K tokens	Best balance of capability and cost
`claude-opus-4-20250514`	200K tokens	Maximum capability
`claude-haiku-3-20250514`	200K tokens	Fast, cheapest option

📝 Note

Anthropic uses the api_key_env field explicitly because the default environment variable name differs from OpenAI's convention.

Google Gemini #

Google's Gemini models offer competitive performance, large context windows, and cost-effective pricing.

Setup #

bash

export GEMINI_API_KEY="your-gemini-api-key"

Configuration #

neam

agent GeminiAgent {
  provider: "gemini"
  model: "gemini-2.0-flash"
  api_key_env: "GEMINI_API_KEY"
  temperature: 0.6
  system: "You are a knowledgeable assistant powered by Google Gemini."
}

{
  let response = GeminiAgent.ask("What are the key features of the Transformer architecture?");
  emit response;
}

Available Gemini Models #

Model	Context Window	Strengths
`gemini-2.0-flash`	1M tokens	Very fast, massive context, low cost
`gemini-2.0-pro`	1M tokens	Higher capability, still cost-effective
`gemini-1.5-pro`	2M tokens	Largest context window available

💡 Tip

Gemini's 1M+ token context windows make it ideal for processing very long documents without chunking.

Azure OpenAI #

If your organization uses Azure OpenAI Service, Neam provides direct support through the "azure_openai" provider, or you can use the OpenAI-compatible adapter pattern.

Setup #

bash

export AZURE_OPENAI_API_KEY="your-azure-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"

Using the Direct Provider #

neam

agent AzureAgent {
  provider: "azure_openai"
  model: "gpt-4o"
  endpoint: env("AZURE_OPENAI_ENDPOINT")
  api_key_env: "AZURE_OPENAI_API_KEY"
  temperature: 0.5
  system: "You are an enterprise assistant deployed on Azure."
}

{
  let response = AzureAgent.ask("Summarize the benefits of cloud computing.");
  emit response;
}

Using the OpenAI Adapter #

Alternatively, you can use provider: "openai" with a custom endpoint:

neam

agent AzureViaAdapter {
  provider: "openai"
  model: "gpt-4o"
  endpoint: "https://your-resource.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-08-01-preview"
  api_key_env: "AZURE_OPENAI_API_KEY"
  temperature: 0.5
  system: "You are an enterprise assistant deployed on Azure."
}

Key points: - The env() function reads an environment variable at runtime, keeping endpoints configurable across environments. - The endpoint includes your Azure resource name, deployment name, and API version. - The api_key_env points to your Azure-specific API key.

AWS Bedrock Adapter #

AWS Bedrock provides access to multiple foundation models through AWS infrastructure. Authentication uses AWS SigV4 signing.

Setup #

bash

export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="us-east-1"

Configuration #

neam

agent BedrockAgent {
  provider: "openai"
  model: "anthropic.claude-3-sonnet-20240229-v1:0"
  endpoint: "https://bedrock-runtime.us-east-1.amazonaws.com/model/anthropic.claude-3-sonnet-20240229-v1:0/invoke"
  api_key_env: "AWS_BEDROCK_TOKEN"
  temperature: 0.5
  system: "You are an assistant running on AWS Bedrock."
}

📝 Note

AWS Bedrock authentication uses SigV4 request signing. You may need to use a signing proxy or configure the AWS_BEDROCK_TOKEN with a pre-signed session token.

Native Bedrock Provider #

AWS Bedrock has a dedicated native provider that handles SigV4 signing automatically. Instead of using the OpenAI adapter pattern with a manual endpoint, you can use provider: "bedrock" directly:

neam

agent NativeBedrock {
  provider: "bedrock"
  model: "anthropic.claude-3-5-sonnet-20241022-v2:0"
  system: "You are an enterprise assistant on AWS Bedrock."
}

The native Bedrock provider reads AWS credentials from the environment automatically:

AWS_ACCESS_KEY_ID -- Your AWS access key.
AWS_SECRET_ACCESS_KEY -- Your AWS secret key.
AWS_DEFAULT_REGION -- The AWS region where your Bedrock models are enabled (e.g., us-east-1).

No endpoint or api_key_env fields are needed. The Neam VM resolves the correct regional Bedrock endpoint and signs requests using SigV4 behind the scenes.

neam

// Using multiple Bedrock models in one program

agent BedrockClaude {
  provider: "bedrock"
  model: "anthropic.claude-3-5-sonnet-20241022-v2:0"
  temperature: 0.3
  system: "You are a precise analyst."
}

agent BedrockTitan {
  provider: "bedrock"
  model: "amazon.titan-text-express-v1"
  temperature: 0.5
  system: "You are a helpful assistant."
}

{
  let question = "What are the benefits of serverless architecture?";
  let r1 = BedrockClaude.ask(question);
  emit "Claude on Bedrock: " + r1;
  emit "";
  let r2 = BedrockTitan.ask(question);
  emit "Titan on Bedrock: " + r2;
}

💡 Tip

The native "bedrock" provider is the recommended approach for new projects. The adapter pattern (using provider: "openai" with a custom endpoint) still works and may be useful if you need to route through a custom proxy.

Google Vertex AI Adapter #

Vertex AI provides access to Gemini models through Google Cloud with enterprise features like VPC Service Controls, customer-managed encryption keys, and regional data residency.

Setup #

bash

# Authenticate with Application Default Credentials
gcloud auth application-default login
export VERTEX_API_KEY="your-vertex-token"

Configuration #

neam

agent VertexAgent {
  provider: "openai"
  model: "gemini-2.0-flash"
  endpoint: "https://us-central1-aiplatform.googleapis.com/v1/projects/your-project/locations/us-central1/publishers/google/models/gemini-2.0-flash:generateContent"
  api_key_env: "VERTEX_API_KEY"
  temperature: 0.5
  system: "You are an enterprise assistant running on Vertex AI."
}

Custom and OpenAI-Compatible Endpoints #

Many LLM serving frameworks expose an OpenAI-compatible API. This means you can connect Neam to virtually any model server -- including vLLM, text-generation-inference (TGI), LiteLLM, LocalAI, and self-hosted inference endpoints -- by using provider: "openai" with a custom endpoint.

Basic Configuration #

neam

agent CustomAgent {
  provider: "openai"
  model: "my-custom-model"
  endpoint: "https://my-llm-server.example.com/v1/chat/completions"
  api_key_env: "MY_API_KEY"
  system: "You are a helpful assistant."
}

The key fields:

provider: "openai" -- Tells Neam to use the OpenAI request/response format.
endpoint -- The full URL of your custom server's chat completions endpoint.
model -- The model name your server expects. This varies by server.
api_key_env -- The environment variable holding your API key (if required).

Common OpenAI-Compatible Servers #

Server	Typical Endpoint	Notes
vLLM	`http://localhost:8000/v1/chat/completions`	High-throughput GPU serving
text-generation-inference	`http://localhost:8080/v1/chat/completions`	HuggingFace's inference server
LiteLLM	`http://localhost:4000/v1/chat/completions`	Proxy that unifies 100+ providers
LocalAI	`http://localhost:8080/v1/chat/completions`	CPU-friendly local inference

Example: Connecting to vLLM #

neam

agent VllmAgent {
  provider: "openai"
  model: "meta-llama/Llama-3-8B-Instruct"
  endpoint: "http://localhost:8000/v1/chat/completions"
  temperature: 0.7
  system: "You are a helpful assistant served by vLLM."
}

{
  let response = VllmAgent.ask("Explain gradient descent in simple terms.");
  emit response;
}

Example: Connecting to LiteLLM Proxy #

LiteLLM acts as a unified proxy, letting you switch between providers by changing only the model name:

neam

agent LiteLLMAgent {
  provider: "openai"
  model: "gpt-4o"
  endpoint: "http://localhost:4000/v1/chat/completions"
  api_key_env: "LITELLM_API_KEY"
  system: "You are a helpful assistant via LiteLLM proxy."
}

{
  let response = LiteLLMAgent.ask("What are the SOLID principles?");
  emit response;
}

📝 Note

When using custom endpoints, features like streaming, function calling, and vision depend on what the server supports. Not all OpenAI-compatible servers implement the full API surface.

Try It Yourself: Connect to a Custom Endpoint

If you have Docker installed, spin up a quick LiteLLM proxy or vLLM server and connect a Neam agent to it. Try changing the model field while keeping the same endpoint, and observe how the proxy routes to different backends. This is a great way to experiment with models you cannot run locally.

Multi-Provider Programs #

One of Neam's most powerful features is the ability to use multiple providers in a single program. This enables patterns like:

Development/Production split: Use Ollama locally, OpenAI in production.
Cost optimization: Route simple tasks to cheap models, complex tasks to powerful ones.
Redundancy: Fall back to a different provider if one is unavailable.
Best-of-breed: Use each provider for what it does best.

neam

// Demonstrates using four providers in one program

agent OpenAIAgent {
  provider: "openai"
  model: "gpt-4o"
  system: "You are powered by OpenAI."
}

agent AnthropicAgent {
  provider: "anthropic"
  model: "claude-sonnet-4-20250514"
  api_key_env: "ANTHROPIC_API_KEY"
  system: "You are powered by Anthropic."
}

agent GeminiAgent {
  provider: "gemini"
  model: "gemini-2.0-flash"
  api_key_env: "GEMINI_API_KEY"
  system: "You are powered by Google Gemini."
}

agent LocalAgent {
  provider: "ollama"
  model: "llama3.2:3b"
  endpoint: "http://localhost:11434"
  system: "You run locally via Ollama."
}

{
  let question = "In one sentence, what makes a good programming language?";

  emit "=== Multi-Provider Comparison ===";
  emit "Question: " + question;
  emit "";

  let r1 = OpenAIAgent.ask(question);
  emit "OpenAI (gpt-4o): " + r1;
  emit "";

  let r2 = AnthropicAgent.ask(question);
  emit "Anthropic (Claude): " + r2;
  emit "";

  let r3 = GeminiAgent.ask(question);
  emit "Gemini (2.0 Flash): " + r3;
  emit "";

  let r4 = LocalAgent.ask(question);
  emit "Ollama (llama3.2:3b): " + r4;
}

🎯 Try It Yourself: Multi-Provider Face-Off

Multi-Provider Face-Off

Copy the program above and modify it to test a question relevant to your domain -- for example, "Explain the difference between REST and GraphQL" or "Summarize the key ideas of functional programming." Compare the responses for quality, length, and tone. Try adding a clock() call before and after each .ask() to measure latency. Which provider gives the best answer for your use case? Which is fastest?

Streaming Responses #

For long responses, waiting for the entire output to generate before displaying anything creates a poor user experience. Streaming delivers tokens as they are generated, so the user sees output progressively.

Batch (default .ask()):

User ---> Agent ---> LLM .............. Response complete

Time: ================================|

User waits entire time

Streaming (chat_stream):

User ---> Agent ---> LLM -> tok -> tok -> tok -> tok -> done

| | | | |

Time: ===|====|====|====|====|

User sees tokens as they arrive

In Neam, streaming is supported by providers that implement the chat_stream interface. The .ask() method handles both batch and streaming transparently -- the VM will stream if the provider supports it and accumulate the result into the returned string:

neam

agent StreamBot {
  provider: "openai"
  model: "gpt-4o-mini"
  system: "You are a helpful assistant. Tell engaging stories."
}

{
  // The VM streams internally and returns the complete response
  let response = StreamBot.ask("Tell me a short story about a robot learning to cook.");
  emit response;
}

📝 Note

At the language level, .ask() always returns a complete string. Streaming is an optimization handled by the VM layer to reduce time-to-first-token latency. A streaming callback API for real-time token processing may be exposed in the future.

Token Tracking and Cost Monitoring #

When working with cloud providers, keeping track of token usage and costs is critical. Neam automatically tracks this information for every agent call. When tracing is enabled, each call generates a trace entry with full details:

text

Agent: SmartAssistant
  Model: gpt-4o-mini
  Prompt tokens: 120
  Completion tokens: 85
  Total tokens: 205
  Estimated cost: $0.00041
  Latency: 842ms

You can enable tracing in your neam.toml:

toml

[agent]
tracing = true
trace_dir = ".neam/traces"

Or enable it per runner:

neam

runner MyPipeline {
  entry_agent: TriageAgent
  tracing: enabled
}

When tracing is enabled, the VM writes structured trace logs for every LLM call and tool invocation. These traces are useful for:

Cost auditing -- See exactly how much each agent costs per query.
Performance tuning -- Identify slow calls and optimize prompts to use fewer tokens.
Debugging -- Trace the full conversation flow through multi-agent systems.
Compliance -- Maintain an audit trail of all LLM interactions.

💡 Tip

During development, enable tracing to understand how your agents behave. In production, use it for cost monitoring and debugging.

Provider Feature Comparison #

Not all providers support the same features. Here is a quick reference:

Feature	OpenAI	Ollama	Anthropic	Gemini	Bedrock	Azure OpenAI	Vertex AI
Streaming	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Native tool calling	Yes	Yes*	Yes	Yes	Yes	Yes	Yes
Vision (images)	Yes	Limited**	Yes	Yes	Yes	Yes	Yes
Token counting	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Custom endpoint	Yes	Yes	Yes	Yes	N/A	Yes	Yes
Local/private	No	Yes	No	No	No	No	No
Voice (STT/TTS)	Yes	No	No	Yes	No	No	No

Native tool calling is supported across all providers. When you declare skill parameters with types (e.g., params: { city: string, units: string }), the Neam VM automatically generates a JSON Schema definition and sends it to the provider using its native function/tool calling protocol. This eliminates the need for manual schema authoring and ensures consistent behavior across providers.

* Ollama function calling depends on the specific model. Not all local models support it.

** Ollama vision support requires vision-capable models like llava or bakllava.

Vision and Multimodal Input #

Several providers support vision -- the ability to analyze images alongside text prompts. Neam provides the ask_with_image() method for this purpose.

Image from URL #

neam

agent VisionBot {
  provider: "openai"
  model: "gpt-4o"
  system: "You can see and analyze images. Describe what you observe."
}

{
  let description = VisionBot.ask_with_image(
    "What is in this image?",
    "https://example.com/photo.jpg"
  );
  emit description;
}

Vision Provider Support #

Provider	Vision Models	Image Input
OpenAI	`gpt-4o`, `gpt-4o-mini`	URL, base64
Anthropic	`claude-sonnet-4-*`	URL, base64
Gemini	`gemini-2.0-flash`, `gemini-2.0-pro`	URL, base64
Ollama	`llava`, `bakllava`	URL, base64

Practical Vision Example #

neam

agent ImageAnalyzer {
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.3
  system: "You are an image analysis expert. Describe images in detail,
           identifying objects, text, colors, and spatial relationships.
           Be precise and structured in your descriptions."
}

agent CaptionWriter {
  provider: "openai"
  model: "gpt-4o-mini"
  temperature: 0.7
  system: "You write engaging social media captions. Given an image
           description, create a catchy caption under 280 characters."
}

{
  // Step 1: Analyze the image
  let analysis = ImageAnalyzer.ask_with_image(
    "Describe this image in detail.",
    "https://example.com/sunset.jpg"
  );
  emit "Analysis: " + analysis;
  emit "";

  // Step 2: Generate a caption from the analysis
  let caption = CaptionWriter.ask("Write a social media caption for this image: " + analysis);
  emit "Caption: " + caption;
}

Provider Selection Strategy #

Choosing the right provider for each agent is a design decision that affects cost, latency, quality, and privacy. Here is a decision framework:

Cost Comparison #

Provider/Model	Input Cost (per 1M tokens)	Output Cost (per 1M tokens)	Relative Cost
Ollama (any, incl. qwen3:1.7b)	Free	Free	None
Gemini 2.0 Flash	~$0.10	~$0.40	Very Low
GPT-4o-mini	~$0.15	~$0.60	Low
Claude Haiku	~$0.25	~$1.25	Low
Claude Sonnet	~$3.00	~$15.00	Medium
GPT-4o	~$2.50	~$10.00	Medium
Bedrock (Claude Sonnet)	~$3.00	~$15.00	Medium*
Claude Opus	~$15.00	~$75.00	High

* AWS Bedrock pricing is generally comparable to direct API pricing for the same models. Exact costs may vary by region and usage tier. Check the AWS Bedrock pricing page for current rates.

Decision Tree #

neam

// This is a conceptual pattern, not runnable code.
// Use this thinking when designing your agent architecture.

// For development and testing:
//   -> Use Ollama (free, private, no API key needed)

// For simple production tasks (classification, extraction, formatting):
//   -> Use GPT-4o-mini or Gemini 2.0 Flash (low cost, fast)

// For complex reasoning and analysis:
//   -> Use GPT-4o or Claude Sonnet (higher quality)

// For long-context processing (>100K tokens):
//   -> Use Gemini (1M+ token context)

// For privacy-sensitive data:
//   -> Use Ollama (data never leaves your network)

// For enterprise compliance:
//   -> Use Azure OpenAI or Vertex AI (enterprise security)

// For AWS-native deployments:
//   -> Use Bedrock (SigV4 auth, VPC endpoints, IAM integration)

💡 Pro Tip: Development vs. Production Provider Strategy

Development vs. Production Provider Strategy

For development and prototyping, use Ollama with qwen3:1.7b. It is completely free, requires no API key, runs on modest hardware (~2 GB RAM), and responds quickly. This lets you iterate on prompts, test agent architectures, and debug multi-agent flows without spending a cent or worrying about rate limits.

For production, choose the provider whose models best fit your requirements along three axes: latency (how fast do you need responses?), quality (how complex is the reasoning?), and cost (what is your budget per query?). There is no single best provider -- a triage classifier might use gpt-4o-mini (fast, cheap), while a legal document analyzer might need claude-sonnet-4 (high quality, long context). Start cheap and upgrade only the agents that need it.

Practical Provider Selection Pattern #

neam

// Different agents use different providers based on their needs

// Triage agent: fast, cheap -- just classifying input
agent Triage {
  provider: "openai"
  model: "gpt-4o-mini"
  temperature: 0.1
  system: "Classify the input as TECHNICAL, BILLING, or GENERAL. Reply with only the category."
}

// Technical agent: needs strong reasoning
agent TechSupport {
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.3
  system: "You are a senior technical support engineer. Provide detailed solutions."
}

// Billing agent: straightforward, keep costs low
agent BillingSupport {
  provider: "gemini"
  model: "gemini-2.0-flash"
  api_key_env: "GEMINI_API_KEY"
  temperature: 0.3
  system: "You are a billing specialist. Help with payment questions."
}

// Sensitive data processing: keep it local
agent DataProcessor {
  provider: "ollama"
  model: "qwen2.5:14b"
  temperature: 0.1
  system: "You process and classify sensitive customer data. Be precise."
}

fun route_query(query) {
  let category = Triage.ask(query);

  if (category.contains("TECHNICAL")) {
    return TechSupport.ask(query);
  }
  if (category.contains("BILLING")) {
    return BillingSupport.ask(query);
  }
  return "General: I can help with that! " + query;
}

{
  let result = route_query("My API endpoint is returning 503 errors");
  emit result;
}

Provider Failover Pattern #

For production resilience, you can implement a failover pattern that tries a primary provider and falls back to alternatives:

neam

agent PrimaryBot {
  provider: "openai"
  model: "gpt-4o-mini"
  temperature: 0.5
  system: "You are a helpful assistant."
}

agent FallbackBot {
  provider: "ollama"
  model: "llama3.2:3b"
  temperature: 0.5
  system: "You are a helpful assistant."
}

fun ask_with_fallback(prompt) {
  try {
    let result = PrimaryBot.ask(prompt);
    return result;
  } catch (err) {
    emit "[Warning] Primary provider failed, using fallback: " + err;
    try {
      let result = FallbackBot.ask(prompt);
      return result;
    } catch (err2) {
      return "All providers unavailable: " + err2;
    }
  }
}

{
  let response = ask_with_fallback("What is Neam?");
  emit response;
}

Environment Variable Management #

Managing API keys across multiple providers requires discipline. Here are the environment variables each provider expects:

bash

# OpenAI (default for provider: "openai")
export OPENAI_API_KEY="sk-..."

# Anthropic (default for provider: "anthropic")
export ANTHROPIC_API_KEY="sk-ant-..."

# Google Gemini (default for provider: "gemini")
export GEMINI_API_KEY="AI..."

# Azure OpenAI (custom, via api_key_env)
export AZURE_OPENAI_API_KEY="..."

# AWS Bedrock (standard AWS credentials)
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_DEFAULT_REGION="us-east-1"

⚠️ Warning

Never hard-code API keys in your .neam source files. Always use environment variables. If you accidentally commit a key to version control, rotate it immediately.

Summary #

In this chapter, you learned:

Neam supports seven LLM providers: Ollama, OpenAI, Anthropic, Gemini, Azure OpenAI, AWS Bedrock, and GCP Vertex AI -- plus any OpenAI-compatible endpoint.
The first six have native provider implementations. AWS Bedrock has a dedicated "bedrock" provider that handles SigV4 authentication automatically. Vertex AI and custom servers use the OpenAI-compatible adapter pattern with custom endpoints.
Native tool calling is supported across all providers with automatic JSON Schema generation from parameter declarations.
New Ollama models like qwen3:1.7b (lightweight, very fast), qwen2.5:7b (good quality), and llama3:8b (strong general-purpose) expand your local model options.
Custom/OpenAI-compatible endpoints let you connect to vLLM, text-generation-inference, LiteLLM, and other servers using provider: "openai" with a custom endpoint.
Each provider has different strengths, costs, and capabilities. Provider selection should be driven by your specific requirements.
Multi-provider programs are a first-class Neam pattern -- different agents can use different providers within the same program.
Streaming is handled transparently by the VM for providers that support it.
Token tracking and cost monitoring are built in -- enable tracing to see token usage, costs, and latency for every call.
Provider features vary -- not all providers support vision, voice, or function calling equally. The feature comparison table helps you choose.
Vision/multimodal input is available via ask_with_image() across all native providers that support it.
The env() function reads environment variables at runtime, useful for configurable endpoints.
Failover patterns improve production resilience by falling back to alternative providers.
API keys should always be managed through environment variables, never hard-coded.

In the next chapter, we will give your agents the ability to take actions in the world through tools and function calling.

Exercises #

Exercise 11.1: Provider Comparison Write a program that asks the same question to three different providers (e.g., Ollama, OpenAI, and Gemini). Use clock() to measure the response time for each. Emit a formatted comparison table showing the provider, response time, and response length.

Exercise 11.2: Cost Calculator Write a function estimate_cost(provider, model, input_tokens, output_tokens) that returns an estimated cost in USD based on the pricing table in this chapter. Test it with several combinations and emit the results.

Exercise 11.3: Failover Chain Extend the failover pattern to try three providers in sequence: OpenAI, then Anthropic, then Ollama. If all three fail, return a helpful error message. Test this by using an intentionally wrong API key for the first two providers.

Exercise 11.4: Smart Router Write a program with a Router agent (using gpt-4o-mini) that classifies incoming queries as SIMPLE, COMPLEX, or CREATIVE. Route simple queries to gemini-2.0-flash, complex queries to gpt-4o, and creative queries to an Ollama model with temperature: 1.2. Emit the classification and the routed response.

Exercise 11.5: Vision Pipeline Write a program that uses ask_with_image() to analyze an image, then passes the analysis to a second agent that generates a haiku based on the image description. Use different providers for the two agents.

Exercise 11.6: Provider Configuration Matrix Create a Neam program that declares six agents -- one for each configurable field combination (different providers, temperatures, endpoints, and system prompts). Document in comments why you chose each configuration. Run them all against the prompt "Explain why the sky is blue" and compare the outputs.