Chapter 19: Cloud-Native Configuration #
"Configuration is the surface area between your code and your environment. Get it right, and deployment becomes a solved problem. Get it wrong, and every environment is a special snowflake." -- Production engineering proverb
What You Will Learn #
In this chapter, you will learn how to configure Neam for cloud-native deployment using
the neam.toml configuration file. You will understand the new v0.6.0 configuration
sections for distributed state backends, LLM gateway settings, OpenTelemetry export,
secrets management, and environment-specific overrides. By the end of this chapter, you
will be able to write a complete neam.toml that takes your agent from a laptop to a
production Kubernetes cluster.
19.1 The neam.toml Configuration File #
Every Neam project starts with a neam.toml file at the project root. This file uses
the TOML format (Tom's Obvious, Minimal Language) and serves as the single source of
truth for project metadata, dependencies, and -- as of v0.6.0 -- all cloud-native
runtime configuration.
The key design principle behind Neam's cloud configuration is separation of concerns:
the .neam source files define what your agents do; the neam.toml file defines
where and how they run. This means the parser, compiler, and bytecode format are
completely unchanged in v0.6.0. All cloud configuration is read at runtime by the VM
and the neamc deploy command.
The Minimal neam.toml #
Here is the simplest valid neam.toml:
neam_version = "1.0"
[project]
name = "my-agent"
version = "0.1.0"
description = "A simple Neam agent"
type = "binary"
[project.entry_points]
main = "src/main.neam"
This is sufficient for local development. The VM will use its defaults: SQLite for state, no LLM gateway, no telemetry export, and environment variables for secrets.
The Full v0.6.0 neam.toml #
Here is a complete neam.toml showing every v0.6.0 configuration section:
neam_version = "1.0"
[project]
name = "customer-service-agent"
version = "1.2.0"
description = "Production customer service agent with triage and routing"
type = "binary"
authors = ["Team <team@example.com>"]
license = "MIT"
[project.entry_points]
main = "src/main.neam"
[dependencies]
agent-utils = "^1.0.0"
[dev-dependencies]
test-framework = "0.1.0"
# --- v0.6.0 Cloud-Native Sections ---
[state]
backend = "postgres"
connection-string = "postgresql://user:pass@host:5432/neam"
ttl-seconds = 7200
[llm]
default-provider = "openai"
default-model = "gpt-4o-mini"
[llm.rate-limits.openai]
requests-per-minute = 120
[llm.rate-limits.anthropic]
requests-per-minute = 60
[llm.circuit-breaker]
failure-threshold = 3
reset-timeout-seconds = 60
half-open-max-requests = 1
[llm.cache]
enabled = true
max-entries = 1000
ttl-seconds = 600
[llm.cost]
daily-budget-usd = 100.0
[llm.fallback-chain]
providers = ["openai", "anthropic", "ollama"]
[telemetry]
enabled = true
endpoint = "http://localhost:4318"
service-name = "neam-agent"
sampling-rate = 0.5
[secrets]
provider = "env"
[deploy]
target = "kubernetes"
Let us examine each section in detail.
19.2 The Configuration Hierarchy #
Before diving into individual sections, it is important to understand how Neam resolves configuration values. The hierarchy follows a standard cloud-native pattern:
The environment variable naming convention maps directly from TOML paths:
| TOML Path | Environment Variable |
|---|---|
[state] backend |
NEAM_STATE_BACKEND |
[state] connection-string |
NEAM_STATE_CONNECTION_STRING |
[llm] default-provider |
NEAM_LLM_DEFAULT_PROVIDER |
[telemetry] enabled |
NEAM_TELEMETRY_ENABLED |
[telemetry] endpoint |
NEAM_OTEL_ENDPOINT |
[secrets] provider |
NEAM_SECRETS_PROVIDER |
This convention means you never need to bake secrets or environment-specific values
into neam.toml. The TOML file defines structure and defaults; environment variables
provide the overrides.
19.3 State Backend Configuration #
The [state] section controls where Neam persists agent memory, learning interactions,
prompt evolution history, autonomous action logs, and budget tracking. In v0.5.0, all
of this was stored in SQLite. In v0.6.0, you can choose from six backends.
Section Reference #
[state]
backend = "postgres" # Required
connection-string = "postgresql://user:pass@host:5432/neam" # Required for non-sqlite
ttl-seconds = 7200 # Optional, default: 0 (no expiry)
Available Backends #
| Backend | backend value |
Connection String Format | Best For |
|---|---|---|---|
| SQLite | "sqlite" |
"./data/neam.db" (file path) |
Local dev, single-node |
| PostgreSQL | "postgres" |
"postgresql://user:pass@host:5432/db" |
Production, multi-node |
| Redis | "redis" |
"redis://host:6379" |
High-throughput, ephemeral state |
| DynamoDB | "dynamodb" |
"dynamodb://us-east-1/neam-state" |
AWS-native deployments |
| CosmosDB | "cosmosdb" |
"cosmosdb://account.documents.azure.com/neam" |
Azure-native deployments |
| Firestore | "firestore" |
"firestore://project-id/neam-state" |
GCP-native deployments |
SQLite (Default) #
SQLite is the default backend and requires no configuration at all. If you omit the
[state] section entirely, or set backend = "sqlite", the VM creates a SQLite
database in the .neam/ directory:
[state]
backend = "sqlite"
connection-string = ".neam/state.db"
SQLite is ideal for local development and single-node deployments. It supports all Neam features including learning, evolution, and autonomous execution. However, it does not support true distributed locking -- the lock implementation uses table-level exclusion, which only works within a single process.
PostgreSQL #
PostgreSQL is the recommended production backend. It provides ACID transactions, true distributed locking via advisory locks, full-text search for memory events, and scales to millions of interactions:
[state]
backend = "postgres"
connection-string = "postgresql://neam:secret@db.example.com:5432/neam_prod"
ttl-seconds = 7200
The Neam VM creates the required schema automatically on first connection. Here is the schema it provisions:
-- Core memory events
CREATE TABLE IF NOT EXISTS neam_events (
id BIGSERIAL PRIMARY KEY,
store_name TEXT NOT NULL,
timestamp BIGINT NOT NULL,
type TEXT NOT NULL,
data TEXT NOT NULL,
agent TEXT NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_neam_events_store
ON neam_events(store_name);
CREATE INDEX IF NOT EXISTS idx_neam_events_ts
ON neam_events(store_name, timestamp);
-- Learning interactions
CREATE TABLE IF NOT EXISTS neam_learning_interactions (
id BIGSERIAL PRIMARY KEY,
agent_name TEXT NOT NULL,
query TEXT NOT NULL,
response TEXT NOT NULL,
reflection_score DOUBLE PRECISION,
feedback_score DOUBLE PRECISION,
tokens_used INTEGER DEFAULT 0,
timestamp BIGINT NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_neam_li_agent
ON neam_learning_interactions(agent_name, timestamp);
-- Prompt evolution
CREATE TABLE IF NOT EXISTS neam_prompt_evolution (
id BIGSERIAL PRIMARY KEY,
agent_name TEXT NOT NULL,
version INTEGER NOT NULL,
original_prompt TEXT,
evolved_prompt TEXT NOT NULL,
reasoning TEXT,
status TEXT DEFAULT 'active',
timestamp BIGINT NOT NULL,
UNIQUE(agent_name, version)
);
-- Distributed locks
CREATE TABLE IF NOT EXISTS neam_distributed_locks (
lock_name TEXT PRIMARY KEY,
holder_id TEXT NOT NULL,
expires_at BIGINT NOT NULL
);
The ttl-seconds field controls how long memory events are retained. Set it to 0
(the default) to retain events indefinitely.
The PostgreSQL backend requires the libpq development headers at compile
time. Build with -DNEAM_BACKEND_POSTGRES=ON (this is ON by default in the Docker
image).
Redis #
Redis is optimized for high-throughput scenarios where sub-millisecond read latency matters more than long-term durability:
[state]
backend = "redis"
connection-string = "redis://redis.example.com:6379"
ttl-seconds = 3600
Redis uses native data structures for each concern:
- Memory events: Sorted Sets (score = timestamp)
- Checkpoints: Hashes
- Learning interactions: Sorted Sets
- Distributed locks:
SET key value EX ttl NX(native Redis locking) - Budgets: Hashes with daily key rotation
Redis AOF persistence is recommended for production. Without it, a Redis restart will lose all agent state. For durable state, prefer PostgreSQL.
DynamoDB #
DynamoDB is the natural choice for AWS-native deployments, especially when running on Lambda or ECS Fargate:
[state]
backend = "dynamodb"
connection-string = "dynamodb://us-east-1/neam-state-table"
DynamoDB uses a single-table design with composite keys:
| Partition Key (PK) | Sort Key (SK) | Purpose |
|---|---|---|
EVENT#{store} |
{timestamp}#{uuid} |
Memory events |
LEARNING#{agent} |
{timestamp}#{uuid} |
Learning interactions |
EVOLUTION#{agent} |
V#{version} |
Prompt evolution |
LOCK#{name} |
LOCK |
Distributed locks (with TTL) |
BUDGET#{agent} |
{date} |
Budget tracking |
DynamoDB requires -DNEAM_BACKEND_AWS=ON at compile time. Authentication
uses AWS IAM credentials (environment variables or instance profile).
CosmosDB #
CosmosDB is the Azure-native option:
[state]
backend = "cosmosdb"
connection-string = "cosmosdb://neamaccount.documents.azure.com/neam-db"
CosmosDB uses the SQL API with a single container partitioned by agent_name. Each
document includes a type field (event, learning, evolution, lock, budget)
for multiplexing.
CosmosDB requires -DNEAM_BACKEND_AZURE=ON at compile time.
Practical Example: Switching Backends by Environment #
A common pattern is to use SQLite for development and PostgreSQL for production. You achieve this with environment variables:
agent ServiceBot {
provider: "openai"
model: "gpt-4o-mini"
system: "You are a customer service agent."
memory: "service_memory"
learning: {
strategy: "experience_replay"
review_interval: 10
}
}
{
let answer = ServiceBot.ask("Help me with my order");
emit answer;
}
# neam.toml -- defaults to postgres for the team
[state]
backend = "postgres"
connection-string = "postgresql://neam:dev@localhost:5432/neam_dev"
# Local development: override to sqlite
export NEAM_STATE_BACKEND=sqlite
neam-cli src/main.neam
# Production: use the neam.toml postgres config with production credentials
export NEAM_STATE_CONNECTION_STRING="postgresql://neam:$DB_PASS@prod-db:5432/neam"
neam-api --port 8080
19.4 LLM Gateway Configuration #
The [llm] section configures the LLM Gateway -- a centralized layer that sits
between your agents and the LLM providers. The gateway handles rate limiting, circuit
breaking, response caching, cost tracking, and provider failover. All of this is
transparent to your agent code.
Why a Gateway? #
Without a gateway, every Agent.ask() call goes directly to the provider API. This
works fine in development, but in production you face several problems:
- Rate limiting: Provider APIs have request-per-minute limits. Without coordination, concurrent agents exhaust the limit and receive 429 errors.
- Cascading failures: If a provider goes down, all agents fail simultaneously. There is no circuit breaking or failover.
- Cost overruns: Without tracking, a runaway agent can spend your entire API budget in minutes.
- Redundant calls: Identical queries to the same model with temperature 0 produce identical results, but you pay for each call.
The LLM Gateway solves all of these problems in-process, without requiring a separate sidecar or proxy service.
Default Provider and Model #
[llm]
default-provider = "openai"
default-model = "gpt-4o-mini"
These values are used when an agent declaration does not specify a provider or model.
They can be overridden per-agent in the .neam source.
Rate Limits #
Configure per-provider rate limits:
[llm.rate-limits.openai]
requests-per-minute = 120
[llm.rate-limits.anthropic]
requests-per-minute = 60
[llm.rate-limits.gemini]
requests-per-minute = 30
The gateway implements rate limiting with a token bucket algorithm. Each provider gets
its own bucket with a refill rate of requests-per-minute / 60.0 tokens per second.
When the bucket is empty, requests queue for up to 5 seconds before returning a 503.
Circuit Breaker #
The circuit breaker prevents cascading failures when a provider is unhealthy:
[llm.circuit-breaker]
failure-threshold = 3
reset-timeout-seconds = 60
half-open-max-requests = 1
The circuit breaker has three states:
- Closed (normal): All requests pass through. Failures are counted.
- Open: All requests are immediately rejected. The gateway tries the next provider in the fallback chain.
- Half-Open: After
reset-timeout-seconds, the gateway allowshalf-open-max-requestsprobe request(s). If the probe succeeds, the circuit closes. If it fails, the circuit opens again.
Failures that trigger the circuit breaker include: HTTP 429 (rate limited), 5xx (server errors), connection timeouts, and DNS resolution failures. Client errors (400, 401, 403, 404) do not trigger the circuit breaker.
Response Cache #
[llm.cache]
enabled = true
max-entries = 1000
ttl-seconds = 600
When enabled, the gateway caches LLM responses using a SHA-256 hash of the provider
name, model name, and serialized message array as the cache key. Caching only applies
to deterministic requests where temperature == 0.0.
The cache uses LRU eviction when max-entries is reached and TTL-based expiration.
Cache hits bypass the rate limiter, circuit breaker, and provider entirely.
Cost Tracking #
[llm.cost]
daily-budget-usd = 100.0
After each successful LLM call, the gateway computes the cost using Neam's built-in
pricing table (updated with each release). When daily-budget-usd is set and the
accumulated daily cost exceeds the threshold, the gateway logs a warning. It does not
hard-block requests by default -- you can implement hard blocking with guardrails if
needed.
You can query cost data programmatically:
{
// The gateway tracks costs automatically
let response = MyAgent.ask("Analyze this dataset");
emit response;
// In a monitoring context, you would check costs via the /health endpoint
// or through OpenTelemetry metrics
}
Provider Failover Chain #
[llm.fallback-chain]
providers = ["openai", "anthropic", "ollama"]
When the primary provider's circuit is open, the gateway automatically tries the next provider in the chain. This enables graceful degradation -- if OpenAI is down, requests fall back to Anthropic, and if Anthropic is also down, they fall back to a local Ollama instance.
Complete LLM Configuration Example #
[llm]
default-provider = "openai"
default-model = "gpt-4o-mini"
[llm.rate-limits.openai]
requests-per-minute = 120
[llm.rate-limits.anthropic]
requests-per-minute = 60
[llm.circuit-breaker]
failure-threshold = 3
reset-timeout-seconds = 60
half-open-max-requests = 1
[llm.cache]
enabled = true
max-entries = 1000
ttl-seconds = 600
[llm.cost]
daily-budget-usd = 100.0
[llm.fallback-chain]
providers = ["openai", "anthropic"]
19.5 Telemetry Configuration #
The [telemetry] section enables OpenTelemetry (OTLP) export for distributed tracing
and metrics collection:
[telemetry]
enabled = true
endpoint = "http://localhost:4318"
service-name = "neam-agent"
sampling-rate = 0.5
Configuration Fields #
| Field | Type | Default | Description |
|---|---|---|---|
enabled |
bool | false |
Enable OTLP export |
endpoint |
string | "" |
OTLP/HTTP endpoint URL |
service-name |
string | "neam-agent" |
Service name in traces |
sampling-rate |
float | 1.0 |
Fraction of requests to trace (0.0-1.0) |
When enabled = true, the Neam VM creates spans for every agent call, tool call, RAG
retrieval, reflection pass, learning review, and handoff. These spans are batched and
exported to the OTLP endpoint every 5 seconds or when the batch reaches 100 spans.
Sampling Rate #
The sampling-rate controls what fraction of requests generate trace data. A value of
1.0 traces everything (useful in development), while 0.1 traces 10% of requests
(appropriate for high-traffic production). Sampling is deterministic -- once a trace is
sampled, all spans within that trace are included.
# Development: trace everything
[telemetry]
enabled = true
endpoint = "http://localhost:4318"
sampling-rate = 1.0
# Production: trace 10% of requests
[telemetry]
enabled = true
endpoint = "http://otel-collector.observability.svc.cluster.local:4318"
sampling-rate = 0.1
Environment-Specific Telemetry #
In Kubernetes, the OTLP endpoint is typically the OpenTelemetry Collector running in the cluster:
# Set via ConfigMap or environment variable
export NEAM_OTEL_ENDPOINT="http://otel-collector.observability.svc.cluster.local:4318"
We will cover the full observability stack (Jaeger, Prometheus, alerting) in Chapter 22.
19.6 Secrets Provider Configuration #
The [secrets] section determines how the Neam VM resolves sensitive values like API
keys and database passwords:
[secrets]
provider = "env"
Available Providers #
| Provider | provider value |
Description |
|---|---|---|
| Environment Variables | "env" |
Read from OPENAI_API_KEY, ANTHROPIC_API_KEY, etc. |
| AWS Secrets Manager | "aws-secrets-manager" |
Fetch secrets from AWS |
| GCP Secret Manager | "gcp-secret-manager" |
Fetch secrets from GCP |
| Azure Key Vault | "azure-key-vault" |
Fetch secrets from Azure |
| HashiCorp Vault | "vault" |
Fetch secrets from Vault |
Environment Variables (Default) #
The default provider reads secrets from environment variables. This is the simplest approach and works everywhere:
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export DATABASE_URL="postgresql://user:pass@host:5432/neam"
AWS Secrets Manager #
[secrets]
provider = "aws-secrets-manager"
region = "us-east-1"
prefix = "neam/"
With this configuration, when the VM needs the OpenAI API key, it fetches
neam/OPENAI_API_KEY from AWS Secrets Manager. The prefix is prepended to all
secret names.
GCP Secret Manager #
[secrets]
provider = "gcp-secret-manager"
project = "my-gcp-project"
Azure Key Vault #
[secrets]
provider = "azure-key-vault"
vault-url = "https://my-vault.vault.azure.net"
Practical Example: Cloud Secrets in Production #
Here is a complete configuration that uses AWS Secrets Manager for production secrets:
[project]
name = "production-agent"
version = "1.0.0"
[project.entry_points]
main = "src/main.neam"
[state]
backend = "postgres"
connection-string = "secret://DATABASE_URL"
[llm]
default-provider = "openai"
default-model = "gpt-4o-mini"
[secrets]
provider = "aws-secrets-manager"
region = "us-east-1"
prefix = "production/neam/"
[telemetry]
enabled = true
endpoint = "http://otel-collector:4318"
The secret://DATABASE_URL syntax tells the VM to resolve the value through the
configured secrets provider. The actual connection string is fetched from AWS Secrets
Manager at production/neam/DATABASE_URL.
19.7 Deploy Target Configuration #
The [deploy] section tells neamc deploy which target platform to generate
artifacts for:
[deploy]
target = "kubernetes"
Available Targets #
| Target | target value |
Generated Artifacts |
|---|---|---|
| Docker | "docker" |
Dockerfile |
| Kubernetes | "kubernetes" |
Deployment, Service, ConfigMap, HPA, PDB, NetworkPolicy |
| AWS Lambda | "lambda" |
SAM template |
| AWS ECS Fargate | "ecs-fargate" |
Task definition, Service, ALB |
| GCP Cloud Run | "cloud-run" |
Cloud Run YAML |
| Azure Container Apps | "azure-container-apps" |
Container App YAML |
| Azure AKS | "azure-aks" |
K8s manifests + AKS annotations |
| Helm | "helm" |
Helm chart (Chart.yaml, values.yaml, templates/) |
Kubernetes-Specific Configuration #
[deploy]
target = "kubernetes"
[deploy.kubernetes]
namespace = "neam-production"
replicas = 3
[deploy.kubernetes.resources]
cpu-request = "500m"
cpu-limit = "1"
memory-request = "512Mi"
memory-limit = "1Gi"
[deploy.kubernetes.scaling]
min-replicas = 2
max-replicas = 20
target-cpu-percent = 70
[deploy.kubernetes.disruption]
min-available = 2
Using neamc deploy #
# Preview generated manifests (dry run)
neamc deploy --target kubernetes --dry-run
# Generate manifests to a directory
neamc deploy --target kubernetes --output ./deploy/
# Generate for a specific cloud
neamc deploy --target ecs-fargate --output ./deploy/
neamc deploy --target cloud-run --output ./deploy/
neamc deploy --target azure-container-apps --output ./deploy/
19.8 Environment-Specific Overrides #
Real-world projects need different configurations for development, staging, and production. Neam supports this through three complementary mechanisms.
Mechanism 1: Environment Variables #
The most common approach. Your neam.toml defines sane defaults, and environment
variables override specific values per environment:
# neam.toml -- shared defaults
[state]
backend = "postgres"
connection-string = "postgresql://neam:dev@localhost:5432/neam_dev"
[llm]
default-provider = "openai"
default-model = "gpt-4o-mini"
[llm.cost]
daily-budget-usd = 10.0
[telemetry]
enabled = false
# development.env
NEAM_STATE_CONNECTION_STRING=postgresql://neam:dev@localhost:5432/neam_dev
NEAM_LLM_COST_DAILY_BUDGET_USD=10.0
NEAM_TELEMETRY_ENABLED=false
# staging.env
NEAM_STATE_CONNECTION_STRING=postgresql://neam:staging@staging-db:5432/neam_staging
NEAM_LLM_COST_DAILY_BUDGET_USD=50.0
NEAM_TELEMETRY_ENABLED=true
NEAM_OTEL_ENDPOINT=http://otel-collector.staging:4318
NEAM_TELEMETRY_SAMPLING_RATE=1.0
# production.env
NEAM_STATE_CONNECTION_STRING=postgresql://neam:$DB_PASS@prod-db:5432/neam_prod
NEAM_LLM_COST_DAILY_BUDGET_USD=500.0
NEAM_TELEMETRY_ENABLED=true
NEAM_OTEL_ENDPOINT=http://otel-collector.production:4318
NEAM_TELEMETRY_SAMPLING_RATE=0.1
Mechanism 2: Kubernetes ConfigMaps and Secrets #
When deploying to Kubernetes, environment-specific values live in ConfigMaps and Secrets. Kustomize overlays (covered in Chapter 20) patch these per environment:
# gitops/base/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: neam-config
data:
NEAM_ENV: "production"
NEAM_LOG_LEVEL: "info"
NEAM_PORT: "8080"
NEAM_TELEMETRY_ENABLED: "true"
NEAM_OTEL_ENDPOINT: "http://otel-collector.observability.svc.cluster.local:4318"
Mechanism 3: Cloud-Specific Secrets #
For sensitive values, use the secrets provider appropriate to your cloud:
# AWS deployment
[secrets]
provider = "aws-secrets-manager"
region = "us-east-1"
prefix = "production/neam/"
# GCP deployment
[secrets]
provider = "gcp-secret-manager"
project = "neam-production"
# Azure deployment
[secrets]
provider = "azure-key-vault"
vault-url = "https://neam-prod.vault.azure.net"
A Complete Multi-Environment Setup #
Here is a practical example showing how a team might structure configuration for a customer service agent across three environments:
# neam.toml -- committed to git
neam_version = "1.0"
[project]
name = "customer-service"
version = "2.1.0"
type = "binary"
[project.entry_points]
main = "src/main.neam"
# Default state config (overridden per environment)
[state]
backend = "postgres"
connection-string = "postgresql://neam:dev@localhost:5432/neam_dev"
# LLM gateway (same structure across environments)
[llm]
default-provider = "openai"
default-model = "gpt-4o-mini"
[llm.rate-limits.openai]
requests-per-minute = 120
[llm.circuit-breaker]
failure-threshold = 3
reset-timeout-seconds = 60
[llm.cache]
enabled = true
max-entries = 1000
ttl-seconds = 600
[llm.cost]
daily-budget-usd = 10.0
# Telemetry off by default (enabled in staging/production)
[telemetry]
enabled = false
# Secrets from environment variables (overridden in cloud environments)
[secrets]
provider = "env"
# Deploy target
[deploy]
target = "kubernetes"
[deploy.kubernetes]
namespace = "neam-dev"
replicas = 1
The corresponding Neam source file remains identical across all environments:
agent TriageAgent {
provider: "openai"
model: "gpt-4o-mini"
temperature: 0.3
system: "You are a triage agent. Route customer requests."
handoffs: [RefundAgent, BillingAgent]
learning: {
strategy: "experience_replay"
review_interval: 20
}
memory: "triage_memory"
}
agent RefundAgent {
provider: "openai"
model: "gpt-4o-mini"
system: "You process refund requests professionally."
memory: "refund_memory"
}
agent BillingAgent {
provider: "openai"
model: "gpt-4o-mini"
system: "You handle billing questions."
memory: "billing_memory"
}
{
let query = input();
let triage = TriageAgent.ask(query);
if (triage.contains("ROUTE:REFUND")) {
emit RefundAgent.ask(query);
}
if (triage.contains("ROUTE:BILLING")) {
emit BillingAgent.ask(query);
}
}
19.9 Configuration Validation #
The neamc compiler validates neam.toml at compile time and reports errors before
any deployment happens:
$ neamc deploy --target kubernetes --dry-run
Validating neam.toml...
[state] backend: postgres OK
[llm] default-provider: openai OK
[llm.cache] enabled: true OK
[telemetry] enabled: true OK
[telemetry] endpoint: (not set) ERROR: endpoint required when enabled
[secrets] provider: env OK
[deploy] target: kubernetes OK
Error: [telemetry] endpoint must be set when telemetry is enabled.
Common validation rules:
[state]requiresconnection-stringfor all backends exceptsqlite[telemetry]requiresendpointwhenenabled = true[llm.rate-limits.*]must haverequests-per-minute > 0[llm.circuit-breaker]must havefailure-threshold >= 1[deploy.kubernetes]requiresnamespacewhen target iskubernetes
19.10 Compile Flags and Backend Dependencies #
Not all backends are included in every build. Heavy cloud SDK dependencies are gated behind CMake compile flags:
| Compile Flag | Enables | Dependencies |
|---|---|---|
| (default) | SQLite backend | None (bundled) |
-DNEAM_BACKEND_POSTGRES=ON |
PostgreSQL backend | libpq |
-DNEAM_BACKEND_REDIS=ON |
Redis backend | hiredis |
-DNEAM_BACKEND_AWS=ON |
DynamoDB, Bedrock, Lambda, ECS | libcurl (bundled) |
-DNEAM_BACKEND_GCP=ON |
Firestore, Cloud Run, Vertex AI | libcurl (bundled) |
-DNEAM_BACKEND_AZURE=ON |
CosmosDB, Azure OpenAI, AKS, Container Apps | libcurl (bundled) |
The default Docker image includes PostgreSQL support:
cmake -B build \
-DCMAKE_BUILD_TYPE=Release \
-DNEAM_BACKEND_POSTGRES=ON
For a full multi-cloud build:
cmake -B build \
-DCMAKE_BUILD_TYPE=Release \
-DNEAM_BACKEND_POSTGRES=ON \
-DNEAM_BACKEND_REDIS=ON \
-DNEAM_BACKEND_AWS=ON \
-DNEAM_BACKEND_GCP=ON \
-DNEAM_BACKEND_AZURE=ON
If you specify a backend in neam.toml that was not compiled in, the VM reports a
clear error:
Error: State backend "dynamodb" is not available.
Build with -DNEAM_BACKEND_AWS=ON to enable DynamoDB support.
19.11 Agent Limits in neam.toml #
The [agent] section configures default behavior for all agents in the project. These
values apply unless overridden in individual agent declarations:
[agent]
provider = "openai"
model = "gpt-4o-mini"
capabilities = ["text", "tools", "vision"]
[agent.limits]
max-tokens-per-request = 4096
max-concurrent-tools = 5
timeout-seconds = 30
max-retries = 3
[agent.prompts]
system = "You are a helpful assistant."
error-recovery = "An error occurred. Please try again."
Agent Limits Reference #
| Field | Type | Default | Description |
|---|---|---|---|
max-tokens-per-request |
int | 4096 |
Maximum tokens per LLM request |
max-concurrent-tools |
int | 5 |
Maximum parallel tool executions |
timeout-seconds |
int | 30 |
Timeout for a single agent call |
max-retries |
int | 3 |
Retry count on transient failures |
These limits act as safety defaults. An agent that exceeds max-tokens-per-request
will have its response truncated. An agent that exceeds timeout-seconds will return
an error. These protect against runaway LLM calls in production.
19.12 Feature Flags and Build Profiles #
Feature Flags #
The [features] section enables conditional compilation of capabilities. This lets you
build different variants of the same project for different deployment targets:
[features]
default = ["basic-agents"]
basic-agents = []
advanced-rag = ["basic-agents"]
multi-provider = ["basic-agents"]
full = ["advanced-rag", "multi-provider"]
Features control which modules and capabilities are included in the compiled output.
The default key specifies which features are active when none are explicitly selected.
Build Profiles #
The [profile.*] sections define compilation settings for different build targets:
[profile.dev]
optimization = "none"
debug = true
source-maps = true
[profile.release]
optimization = "full"
debug = false
strip = true
[profile.bench]
optimization = "full"
debug = true
| Field | Type | Description |
|---|---|---|
optimization |
string | "none", "basic", or "full" |
debug |
bool | Include debug symbols |
source-maps |
bool | Generate source maps for stack traces |
strip |
bool | Strip symbols from binary (smaller output) |
Use neamc --profile release to compile with a specific profile:
# Development build (default)
neamc src/main.neam -o main.neamb
# Release build (optimized, stripped)
neamc --profile release src/main.neam -o main.neamb
19.13 Secret URI Syntax #
The secrets provider supports a URI-based syntax for referencing secrets from different
backends. When the VM encounters a secret:// prefix in a configuration value, it
resolves the secret through the configured provider.
The full URI syntax supports provider-specific prefixes:
| URI Prefix | Provider | Example |
|---|---|---|
secret://NAME |
Configured provider | secret://DATABASE_URL |
env://NAME |
Environment variable | env://OPENAI_API_KEY |
aws-sm://NAME |
AWS Secrets Manager | aws-sm://production/db-password |
azure-kv://VAULT/NAME |
Azure Key Vault | azure-kv://my-vault/api-key |
gcp-sm://PROJECT/NAME |
GCP Secret Manager | gcp-sm://my-project/api-key |
vault://PATH#KEY |
HashiCorp Vault | vault://secret/data/neam#api_key |
k8s://NS/SECRET/KEY |
Kubernetes Secret | k8s://neam-prod/neam-secrets/api-key |
The secret:// prefix uses whatever provider is configured in the [secrets] section.
The provider-specific prefixes bypass the configured provider and go directly to the
specified backend. This lets you mix sources:
[secrets]
provider = "aws-secrets-manager"
region = "us-east-1"
prefix = "production/"
[state]
connection-string = "secret://DATABASE_URL" # Uses AWS Secrets Manager
[llm]
# Override: read this specific key from Vault instead
anthropic-key = "vault://secret/data/llm#anthropic"
The composite secrets resolver chains multiple providers with fallback behavior. If
the primary provider cannot resolve a secret, it tries the next provider in the chain.
This is configured automatically when you use provider-specific URI prefixes alongside
a [secrets] provider.
19.14 NeamClaw Configuration #
The [neamclaw] section configures defaults for claw agents and forge agents. These
values apply unless overridden in individual agent declarations.
Claw Agent Defaults #
[neamclaw.claw]
# Session defaults
idle-reset-minutes = 60
daily-reset-hour = 4
max-history-turns = 100
compaction = "auto"
# Context limits
max-context-tokens = 128000
# Lane defaults
default-lane-concurrency = 1
# Memory defaults
semantic-memory-backend = "sqlite"
semantic-memory-search = "hybrid"
Forge Agent Defaults #
[neamclaw.forge]
# Loop defaults
max-iterations = 25
max-cost = 10.0
max-tokens = 500000
# Checkpoint defaults
checkpoint = "git"
# Workspace defaults
workspace-base = "./workspace"
Channel Configuration #
[neamclaw.channels]
# Global channel settings
default-dm-policy = "open"
default-group-policy = "mention_only"
# HTTP channel defaults
[neamclaw.channels.http]
port = 8080
auth = "bearer"
auth-env = "NEAM_API_KEY"
Memory Configuration #
[neamclaw.memory]
# Chunking parameters for semantic indexing
chunk-size = 400
chunk-overlap = 80
# Hybrid search weights
vector-weight = 0.7
bm25-weight = 0.3
# Default top_k for memory_search()
default-top-k = 5
NeamClaw Configuration Reference #
| Key | Type | Default | Description |
|---|---|---|---|
neamclaw.claw.idle-reset-minutes |
int | 60 |
Session idle reset timeout |
neamclaw.claw.daily-reset-hour |
int | 4 |
Daily session reset hour (0-23) |
neamclaw.claw.max-history-turns |
int | 100 |
Maximum turns to retain |
neamclaw.claw.compaction |
string | "auto" |
"auto", "manual", "disabled" |
neamclaw.claw.max-context-tokens |
int | 128000 |
Context token budget |
neamclaw.forge.max-iterations |
int | 25 |
Default max loop iterations |
neamclaw.forge.max-cost |
float | 10.0 |
Default max cost in USD |
neamclaw.forge.max-tokens |
int | 500000 |
Default max token budget |
neamclaw.forge.checkpoint |
string | "git" |
Default checkpoint strategy |
neamclaw.memory.chunk-size |
int | 400 |
Characters per memory chunk |
neamclaw.memory.chunk-overlap |
int | 80 |
Overlap between chunks |
neamclaw.memory.vector-weight |
float | 0.7 |
Hybrid search vector weight |
neamclaw.memory.bm25-weight |
float | 0.3 |
Hybrid search BM25 weight |
neamclaw.memory.default-top-k |
int | 5 |
Default results for memory_search() |
These defaults simplify agent declarations. If all your claw agents use
the same session configuration, set it once in neam.toml and omit the session
block from individual agent declarations.
19.15 Putting It All Together #
Here is a complete, production-ready neam.toml for a customer service agent deployed
to Kubernetes on AWS with PostgreSQL state, OpenAI as the primary LLM with Anthropic
failover, full observability, and AWS Secrets Manager for credentials:
neam_version = "1.0"
[project]
name = "customer-service-agent"
version = "2.1.0"
description = "Multi-agent customer service system with triage and routing"
type = "binary"
authors = ["Platform Team <platform@company.com>"]
[project.entry_points]
main = "src/main.neam"
[dependencies]
agent-utils = "^1.0.0"
[state]
backend = "postgres"
connection-string = "secret://DATABASE_URL"
ttl-seconds = 86400
[llm]
default-provider = "openai"
default-model = "gpt-4o-mini"
[llm.rate-limits.openai]
requests-per-minute = 500
[llm.rate-limits.anthropic]
requests-per-minute = 200
[llm.circuit-breaker]
failure-threshold = 5
reset-timeout-seconds = 30
half-open-max-requests = 2
[llm.cache]
enabled = true
max-entries = 5000
ttl-seconds = 300
[llm.cost]
daily-budget-usd = 500.0
[llm.fallback-chain]
providers = ["openai", "anthropic"]
[telemetry]
enabled = true
endpoint = "http://otel-collector.observability.svc.cluster.local:4318"
service-name = "customer-service-agent"
sampling-rate = 0.1
[secrets]
provider = "aws-secrets-manager"
region = "us-east-1"
prefix = "production/customer-service/"
[deploy]
target = "kubernetes"
[deploy.kubernetes]
namespace = "customer-service"
replicas = 3
[deploy.kubernetes.resources]
cpu-request = "1"
cpu-limit = "2"
memory-request = "512Mi"
memory-limit = "1Gi"
[deploy.kubernetes.scaling]
min-replicas = 3
max-replicas = 20
target-cpu-percent = 70
[deploy.kubernetes.disruption]
min-available = 2
Summary #
In this chapter, you learned:
- The
neam.tomlfile is the single source of truth for cloud-native configuration - The configuration hierarchy: environment variables > neam.toml > CLI flags > defaults
- Six state backends (SQLite, PostgreSQL, Redis, DynamoDB, CosmosDB, Firestore) and when to use each
- LLM gateway configuration: rate limits, circuit breaking, caching, cost tracking, and failover
- Telemetry configuration for OpenTelemetry export with configurable sampling
- Secrets provider configuration for cloud-native credential management
- Deploy target configuration for generating platform-specific artifacts
- Environment-specific overrides using environment variables and Kubernetes ConfigMaps
- Compile flags that gate cloud-specific dependencies
- Agent limits in neam.toml: token caps, concurrent tool limits, timeouts, and retries
- Feature flags for conditional compilation and build profiles (dev, release, bench)
- Secret URI syntax with provider-specific prefixes (
aws-sm://,azure-kv://,gcp-sm://,vault://,k8s://) and composite resolver fallback chains
In the next chapter, we will use this configuration to build Docker images and deploy to Kubernetes.
Exercises #
Exercise 19.1: Basic Configuration #
Create a neam.toml for a Neam project called research-assistant that:
- Uses PostgreSQL as the state backend with a local development connection string
- Sets OpenAI as the default provider with
gpt-4oas the default model - Limits OpenAI to 60 requests per minute
- Enables response caching with a 10-minute TTL
- Disables telemetry
Exercise 19.2: Circuit Breaker Analysis #
Given the following circuit breaker configuration:
[llm.circuit-breaker]
failure-threshold = 3
reset-timeout-seconds = 120
half-open-max-requests = 1
Answer the following questions:
- How many consecutive failures trigger the circuit to open?
- Once open, how long until the circuit transitions to half-open?
- If the half-open probe request fails, what happens?
- If the half-open probe succeeds, what happens?
- Draw the state transitions that occur if: success, success, failure, failure, failure, (wait 120s), failure, (wait 120s), success.
Exercise 19.3: Multi-Environment Configuration #
Design a configuration strategy for a Neam agent that runs in three environments:
- Development: SQLite state, no rate limits, no telemetry, $5/day budget
- Staging: PostgreSQL state, OpenAI at 60 RPM, telemetry at 100% sampling, $50/day
- Production: PostgreSQL state, OpenAI at 500 RPM with Anthropic failover, telemetry at 10% sampling, $500/day
Write the neam.toml file and the three .env files (dev.env, staging.env,
production.env) that implement this strategy.
Exercise 19.4: Secrets Migration #
You have a project that uses environment variables for secrets:
export OPENAI_API_KEY="sk-abc123"
export DATABASE_URL="postgresql://user:plaintext-pass@db:5432/neam"
Migrate this project to use AWS Secrets Manager. Write the updated [secrets] section
of neam.toml and describe the steps needed to store the secrets in AWS Secrets
Manager using the AWS CLI.
Exercise 19.5: Cost Tracking Design #
A team runs 10 agents that collectively make approximately 10,000 LLM calls per day.
The average cost per call is $0.002 for gpt-4o-mini.
- What should
daily-budget-usdbe set to, with a 50% safety margin? - If one agent has a bug that causes an infinite loop of LLM calls, how quickly would the daily budget be exhausted at 120 requests per minute?
- What additional safeguards (beyond the gateway cost tracking) would you implement? Consider agent-level budgets, guardrails, and circuit breakers.
Exercise 19.6: Backend Selection Decision #
For each of the following deployment scenarios, recommend a state backend and justify your choice:
- A prototype agent running on a developer's laptop
- A production agent on AWS ECS Fargate that needs sub-10ms state reads
- A multi-region deployment on Azure that must survive a region failure
- A high-throughput event processing agent handling 50,000 interactions per hour
- A cost-sensitive startup deploying to a single Kubernetes cluster