📖 13 min read

Chapter 19: Cloud-Native Configuration #

"Configuration is the surface area between your code and your environment. Get it right, and deployment becomes a solved problem. Get it wrong, and every environment is a special snowflake." -- Production engineering proverb

What You Will Learn #

In this chapter, you will learn how to configure Neam for cloud-native deployment using the neam.toml configuration file. You will understand the new v0.6.0 configuration sections for distributed state backends, LLM gateway settings, OpenTelemetry export, secrets management, and environment-specific overrides. By the end of this chapter, you will be able to write a complete neam.toml that takes your agent from a laptop to a production Kubernetes cluster.

19.1 The neam.toml Configuration File #

Every Neam project starts with a neam.toml file at the project root. This file uses the TOML format (Tom's Obvious, Minimal Language) and serves as the single source of truth for project metadata, dependencies, and -- as of v0.6.0 -- all cloud-native runtime configuration.

The key design principle behind Neam's cloud configuration is separation of concerns: the .neam source files define what your agents do; the neam.toml file defines where and how they run. This means the parser, compiler, and bytecode format are completely unchanged in v0.6.0. All cloud configuration is read at runtime by the VM and the neamc deploy command.

The Minimal neam.toml #

Here is the simplest valid neam.toml:

toml

neam_version = "1.0"

[project]
name = "my-agent"
version = "0.1.0"
description = "A simple Neam agent"
type = "binary"

[project.entry_points]
main = "src/main.neam"

This is sufficient for local development. The VM will use its defaults: SQLite for state, no LLM gateway, no telemetry export, and environment variables for secrets.

The Full v0.6.0 neam.toml #

Here is a complete neam.toml showing every v0.6.0 configuration section:

toml

neam_version = "1.0"

[project]
name = "customer-service-agent"
version = "1.2.0"
description = "Production customer service agent with triage and routing"
type = "binary"
authors = ["Team <team@example.com>"]
license = "MIT"

[project.entry_points]
main = "src/main.neam"

[dependencies]
agent-utils = "^1.0.0"

[dev-dependencies]
test-framework = "0.1.0"

# --- v0.6.0 Cloud-Native Sections ---

[state]
backend = "postgres"
connection-string = "postgresql://user:pass@host:5432/neam"
ttl-seconds = 7200

[llm]
default-provider = "openai"
default-model = "gpt-4o-mini"

[llm.rate-limits.openai]
requests-per-minute = 120

[llm.rate-limits.anthropic]
requests-per-minute = 60

[llm.circuit-breaker]
failure-threshold = 3
reset-timeout-seconds = 60
half-open-max-requests = 1

[llm.cache]
enabled = true
max-entries = 1000
ttl-seconds = 600

[llm.cost]
daily-budget-usd = 100.0

[llm.fallback-chain]
providers = ["openai", "anthropic", "ollama"]

[telemetry]
enabled = true
endpoint = "http://localhost:4318"
service-name = "neam-agent"
sampling-rate = 0.5

[secrets]
provider = "env"

[deploy]
target = "kubernetes"

Let us examine each section in detail.

19.2 The Configuration Hierarchy #

Before diving into individual sections, it is important to understand how Neam resolves configuration values. The hierarchy follows a standard cloud-native pattern:

Priority (highest to lowest):

1. Environment Variables NEAM_STATE_BACKEND=redis

(override everything)

2. neam.toml (environment) [state]

(project-level config) backend = "postgres"

3. CLI Flags --state-backend postgres

(command-line overrides)

4. Built-in Defaults sqlite, no gateway,

(zero-config local dev) no telemetry

The environment variable naming convention maps directly from TOML paths:

TOML Path	Environment Variable
`[state] backend`	`NEAM_STATE_BACKEND`
`[state] connection-string`	`NEAM_STATE_CONNECTION_STRING`
`[llm] default-provider`	`NEAM_LLM_DEFAULT_PROVIDER`
`[telemetry] enabled`	`NEAM_TELEMETRY_ENABLED`
`[telemetry] endpoint`	`NEAM_OTEL_ENDPOINT`
`[secrets] provider`	`NEAM_SECRETS_PROVIDER`

This convention means you never need to bake secrets or environment-specific values into neam.toml. The TOML file defines structure and defaults; environment variables provide the overrides.

19.3 State Backend Configuration #

The [state] section controls where Neam persists agent memory, learning interactions, prompt evolution history, autonomous action logs, and budget tracking. In v0.5.0, all of this was stored in SQLite. In v0.6.0, you can choose from six backends.

Section Reference #

toml

[state]
backend = "postgres"                                    # Required
connection-string = "postgresql://user:pass@host:5432/neam"  # Required for non-sqlite
ttl-seconds = 7200                                       # Optional, default: 0 (no expiry)

Available Backends #

Backend	`backend` value	Connection String Format	Best For
SQLite	`"sqlite"`	`"./data/neam.db"` (file path)	Local dev, single-node
PostgreSQL	`"postgres"`	`"postgresql://user:pass@host:5432/db"`	Production, multi-node
Redis	`"redis"`	`"redis://host:6379"`	High-throughput, ephemeral state
DynamoDB	`"dynamodb"`	`"dynamodb://us-east-1/neam-state"`	AWS-native deployments
CosmosDB	`"cosmosdb"`	`"cosmosdb://account.documents.azure.com/neam"`	Azure-native deployments
Firestore	`"firestore"`	`"firestore://project-id/neam-state"`	GCP-native deployments

SQLite (Default) #

SQLite is the default backend and requires no configuration at all. If you omit the [state] section entirely, or set backend = "sqlite", the VM creates a SQLite database in the .neam/ directory:

toml

[state]
backend = "sqlite"
connection-string = ".neam/state.db"

SQLite is ideal for local development and single-node deployments. It supports all Neam features including learning, evolution, and autonomous execution. However, it does not support true distributed locking -- the lock implementation uses table-level exclusion, which only works within a single process.

PostgreSQL #

PostgreSQL is the recommended production backend. It provides ACID transactions, true distributed locking via advisory locks, full-text search for memory events, and scales to millions of interactions:

toml

[state]
backend = "postgres"
connection-string = "postgresql://neam:secret@db.example.com:5432/neam_prod"
ttl-seconds = 7200

The Neam VM creates the required schema automatically on first connection. Here is the schema it provisions:

sql

-- Core memory events
CREATE TABLE IF NOT EXISTS neam_events (
  id BIGSERIAL PRIMARY KEY,
  store_name TEXT NOT NULL,
  timestamp BIGINT NOT NULL,
  type TEXT NOT NULL,
  data TEXT NOT NULL,
  agent TEXT NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_neam_events_store
  ON neam_events(store_name);
CREATE INDEX IF NOT EXISTS idx_neam_events_ts
  ON neam_events(store_name, timestamp);

-- Learning interactions
CREATE TABLE IF NOT EXISTS neam_learning_interactions (
  id BIGSERIAL PRIMARY KEY,
  agent_name TEXT NOT NULL,
  query TEXT NOT NULL,
  response TEXT NOT NULL,
  reflection_score DOUBLE PRECISION,
  feedback_score DOUBLE PRECISION,
  tokens_used INTEGER DEFAULT 0,
  timestamp BIGINT NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_neam_li_agent
  ON neam_learning_interactions(agent_name, timestamp);

-- Prompt evolution
CREATE TABLE IF NOT EXISTS neam_prompt_evolution (
  id BIGSERIAL PRIMARY KEY,
  agent_name TEXT NOT NULL,
  version INTEGER NOT NULL,
  original_prompt TEXT,
  evolved_prompt TEXT NOT NULL,
  reasoning TEXT,
  status TEXT DEFAULT 'active',
  timestamp BIGINT NOT NULL,
  UNIQUE(agent_name, version)
);

-- Distributed locks
CREATE TABLE IF NOT EXISTS neam_distributed_locks (
  lock_name TEXT PRIMARY KEY,
  holder_id TEXT NOT NULL,
  expires_at BIGINT NOT NULL
);

The ttl-seconds field controls how long memory events are retained. Set it to 0 (the default) to retain events indefinitely.

📝 Note

The PostgreSQL backend requires the libpq development headers at compile time. Build with -DNEAM_BACKEND_POSTGRES=ON (this is ON by default in the Docker image).

Redis #

Redis is optimized for high-throughput scenarios where sub-millisecond read latency matters more than long-term durability:

toml

[state]
backend = "redis"
connection-string = "redis://redis.example.com:6379"
ttl-seconds = 3600

Redis uses native data structures for each concern:

Memory events: Sorted Sets (score = timestamp)
Checkpoints: Hashes
Learning interactions: Sorted Sets
Distributed locks: SET key value EX ttl NX (native Redis locking)
Budgets: Hashes with daily key rotation

⚠️ Warning

Redis AOF persistence is recommended for production. Without it, a Redis restart will lose all agent state. For durable state, prefer PostgreSQL.

DynamoDB #

DynamoDB is the natural choice for AWS-native deployments, especially when running on Lambda or ECS Fargate:

toml

[state]
backend = "dynamodb"
connection-string = "dynamodb://us-east-1/neam-state-table"

DynamoDB uses a single-table design with composite keys:

Partition Key (PK)	Sort Key (SK)	Purpose
`EVENT#{store}`	`{timestamp}#{uuid}`	Memory events
`LEARNING#{agent}`	`{timestamp}#{uuid}`	Learning interactions
`EVOLUTION#{agent}`	`V#{version}`	Prompt evolution
`LOCK#{name}`	`LOCK`	Distributed locks (with TTL)
`BUDGET#{agent}`	`{date}`	Budget tracking

📝 Note

DynamoDB requires -DNEAM_BACKEND_AWS=ON at compile time. Authentication uses AWS IAM credentials (environment variables or instance profile).

CosmosDB #

CosmosDB is the Azure-native option:

toml

[state]
backend = "cosmosdb"
connection-string = "cosmosdb://neamaccount.documents.azure.com/neam-db"

CosmosDB uses the SQL API with a single container partitioned by agent_name. Each document includes a type field (event, learning, evolution, lock, budget) for multiplexing.

📝 Note

CosmosDB requires -DNEAM_BACKEND_AZURE=ON at compile time.

Practical Example: Switching Backends by Environment #

A common pattern is to use SQLite for development and PostgreSQL for production. You achieve this with environment variables:

neam

agent ServiceBot {
  provider: "openai"
  model: "gpt-4o-mini"
  system: "You are a customer service agent."
  memory: "service_memory"

  learning: {
    strategy: "experience_replay"
    review_interval: 10
  }
}

{
  let answer = ServiceBot.ask("Help me with my order");
  emit answer;
}

toml

# neam.toml -- defaults to postgres for the team
[state]
backend = "postgres"
connection-string = "postgresql://neam:dev@localhost:5432/neam_dev"

bash

# Local development: override to sqlite
export NEAM_STATE_BACKEND=sqlite
neam-cli src/main.neam

# Production: use the neam.toml postgres config with production credentials
export NEAM_STATE_CONNECTION_STRING="postgresql://neam:$DB_PASS@prod-db:5432/neam"
neam-api --port 8080

19.4 LLM Gateway Configuration #

The [llm] section configures the LLM Gateway -- a centralized layer that sits between your agents and the LLM providers. The gateway handles rate limiting, circuit breaking, response caching, cost tracking, and provider failover. All of this is transparent to your agent code.

Why a Gateway? #

Without a gateway, every Agent.ask() call goes directly to the provider API. This works fine in development, but in production you face several problems:

Rate limiting: Provider APIs have request-per-minute limits. Without coordination, concurrent agents exhaust the limit and receive 429 errors.
Cascading failures: If a provider goes down, all agents fail simultaneously. There is no circuit breaking or failover.
Cost overruns: Without tracking, a runaway agent can spend your entire API budget in minutes.
Redundant calls: Identical queries to the same model with temperature 0 produce identical results, but you pay for each call.

The LLM Gateway solves all of these problems in-process, without requiring a separate sidecar or proxy service.

Default Provider and Model #

toml

[llm]
default-provider = "openai"
default-model = "gpt-4o-mini"

These values are used when an agent declaration does not specify a provider or model. They can be overridden per-agent in the .neam source.

Rate Limits #

Configure per-provider rate limits:

toml

[llm.rate-limits.openai]
requests-per-minute = 120

[llm.rate-limits.anthropic]
requests-per-minute = 60

[llm.rate-limits.gemini]
requests-per-minute = 30

The gateway implements rate limiting with a token bucket algorithm. Each provider gets its own bucket with a refill rate of requests-per-minute / 60.0 tokens per second. When the bucket is empty, requests queue for up to 5 seconds before returning a 503.

Circuit Breaker #

The circuit breaker prevents cascading failures when a provider is unhealthy:

toml

[llm.circuit-breaker]
failure-threshold = 3
reset-timeout-seconds = 60
half-open-max-requests = 1

The circuit breaker has three states:

CLOSED

(normal)

HALF-OPEN

(1 probe)

Closed (normal): All requests pass through. Failures are counted.
Open: All requests are immediately rejected. The gateway tries the next provider in the fallback chain.
Half-Open: After reset-timeout-seconds, the gateway allows half-open-max-requests probe request(s). If the probe succeeds, the circuit closes. If it fails, the circuit opens again.

Failures that trigger the circuit breaker include: HTTP 429 (rate limited), 5xx (server errors), connection timeouts, and DNS resolution failures. Client errors (400, 401, 403, 404) do not trigger the circuit breaker.

Response Cache #

toml

[llm.cache]
enabled = true
max-entries = 1000
ttl-seconds = 600

When enabled, the gateway caches LLM responses using a SHA-256 hash of the provider name, model name, and serialized message array as the cache key. Caching only applies to deterministic requests where temperature == 0.0.

The cache uses LRU eviction when max-entries is reached and TTL-based expiration. Cache hits bypass the rate limiter, circuit breaker, and provider entirely.

Cost Tracking #

toml

[llm.cost]
daily-budget-usd = 100.0

After each successful LLM call, the gateway computes the cost using Neam's built-in pricing table (updated with each release). When daily-budget-usd is set and the accumulated daily cost exceeds the threshold, the gateway logs a warning. It does not hard-block requests by default -- you can implement hard blocking with guardrails if needed.

You can query cost data programmatically:

neam

{
  // The gateway tracks costs automatically
  let response = MyAgent.ask("Analyze this dataset");
  emit response;

  // In a monitoring context, you would check costs via the /health endpoint
  // or through OpenTelemetry metrics
}

Provider Failover Chain #

toml

[llm.fallback-chain]
providers = ["openai", "anthropic", "ollama"]

When the primary provider's circuit is open, the gateway automatically tries the next provider in the chain. This enables graceful degradation -- if OpenAI is down, requests fall back to Anthropic, and if Anthropic is also down, they fall back to a local Ollama instance.

Complete LLM Configuration Example #

toml

[llm]
default-provider = "openai"
default-model = "gpt-4o-mini"

[llm.rate-limits.openai]
requests-per-minute = 120

[llm.rate-limits.anthropic]
requests-per-minute = 60

[llm.circuit-breaker]
failure-threshold = 3
reset-timeout-seconds = 60
half-open-max-requests = 1

[llm.cache]
enabled = true
max-entries = 1000
ttl-seconds = 600

[llm.cost]
daily-budget-usd = 100.0

[llm.fallback-chain]
providers = ["openai", "anthropic"]

19.5 Telemetry Configuration #

The [telemetry] section enables OpenTelemetry (OTLP) export for distributed tracing and metrics collection:

toml

[telemetry]
enabled = true
endpoint = "http://localhost:4318"
service-name = "neam-agent"
sampling-rate = 0.5

Configuration Fields #

Field	Type	Default	Description
`enabled`	bool	`false`	Enable OTLP export
`endpoint`	string	`""`	OTLP/HTTP endpoint URL
`service-name`	string	`"neam-agent"`	Service name in traces
`sampling-rate`	float	`1.0`	Fraction of requests to trace (0.0-1.0)

When enabled = true, the Neam VM creates spans for every agent call, tool call, RAG retrieval, reflection pass, learning review, and handoff. These spans are batched and exported to the OTLP endpoint every 5 seconds or when the batch reaches 100 spans.

Sampling Rate #

The sampling-rate controls what fraction of requests generate trace data. A value of 1.0 traces everything (useful in development), while 0.1 traces 10% of requests (appropriate for high-traffic production). Sampling is deterministic -- once a trace is sampled, all spans within that trace are included.

toml

# Development: trace everything
[telemetry]
enabled = true
endpoint = "http://localhost:4318"
sampling-rate = 1.0

# Production: trace 10% of requests
[telemetry]
enabled = true
endpoint = "http://otel-collector.observability.svc.cluster.local:4318"
sampling-rate = 0.1

Environment-Specific Telemetry #

In Kubernetes, the OTLP endpoint is typically the OpenTelemetry Collector running in the cluster:

bash

# Set via ConfigMap or environment variable
export NEAM_OTEL_ENDPOINT="http://otel-collector.observability.svc.cluster.local:4318"

We will cover the full observability stack (Jaeger, Prometheus, alerting) in Chapter 22.

19.6 Secrets Provider Configuration #

The [secrets] section determines how the Neam VM resolves sensitive values like API keys and database passwords:

toml

[secrets]
provider = "env"

Available Providers #

Provider	`provider` value	Description
Environment Variables	`"env"`	Read from `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, etc.
AWS Secrets Manager	`"aws-secrets-manager"`	Fetch secrets from AWS
GCP Secret Manager	`"gcp-secret-manager"`	Fetch secrets from GCP
Azure Key Vault	`"azure-key-vault"`	Fetch secrets from Azure
HashiCorp Vault	`"vault"`	Fetch secrets from Vault

Environment Variables (Default) #

The default provider reads secrets from environment variables. This is the simplest approach and works everywhere:

bash

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export DATABASE_URL="postgresql://user:pass@host:5432/neam"

AWS Secrets Manager #

toml

[secrets]
provider = "aws-secrets-manager"
region = "us-east-1"
prefix = "neam/"

With this configuration, when the VM needs the OpenAI API key, it fetches neam/OPENAI_API_KEY from AWS Secrets Manager. The prefix is prepended to all secret names.

GCP Secret Manager #

toml

[secrets]
provider = "gcp-secret-manager"
project = "my-gcp-project"

Azure Key Vault #

toml

[secrets]
provider = "azure-key-vault"
vault-url = "https://my-vault.vault.azure.net"

Practical Example: Cloud Secrets in Production #

Here is a complete configuration that uses AWS Secrets Manager for production secrets:

toml

[project]
name = "production-agent"
version = "1.0.0"

[project.entry_points]
main = "src/main.neam"

[state]
backend = "postgres"
connection-string = "secret://DATABASE_URL"

[llm]
default-provider = "openai"
default-model = "gpt-4o-mini"

[secrets]
provider = "aws-secrets-manager"
region = "us-east-1"
prefix = "production/neam/"

[telemetry]
enabled = true
endpoint = "http://otel-collector:4318"

The secret://DATABASE_URL syntax tells the VM to resolve the value through the configured secrets provider. The actual connection string is fetched from AWS Secrets Manager at production/neam/DATABASE_URL.

19.7 Deploy Target Configuration #

The [deploy] section tells neamc deploy which target platform to generate artifacts for:

toml

[deploy]
target = "kubernetes"

Available Targets #

Target	`target` value	Generated Artifacts
Docker	`"docker"`	Dockerfile
Kubernetes	`"kubernetes"`	Deployment, Service, ConfigMap, HPA, PDB, NetworkPolicy
AWS Lambda	`"lambda"`	SAM template
AWS ECS Fargate	`"ecs-fargate"`	Task definition, Service, ALB
GCP Cloud Run	`"cloud-run"`	Cloud Run YAML
Azure Container Apps	`"azure-container-apps"`	Container App YAML
Azure AKS	`"azure-aks"`	K8s manifests + AKS annotations
Helm	`"helm"`	Helm chart (Chart.yaml, values.yaml, templates/)

Kubernetes-Specific Configuration #

toml

[deploy]
target = "kubernetes"

[deploy.kubernetes]
namespace = "neam-production"
replicas = 3

[deploy.kubernetes.resources]
cpu-request = "500m"
cpu-limit = "1"
memory-request = "512Mi"
memory-limit = "1Gi"

[deploy.kubernetes.scaling]
min-replicas = 2
max-replicas = 20
target-cpu-percent = 70

[deploy.kubernetes.disruption]
min-available = 2

Using neamc deploy #

bash

# Preview generated manifests (dry run)
neamc deploy --target kubernetes --dry-run

# Generate manifests to a directory
neamc deploy --target kubernetes --output ./deploy/

# Generate for a specific cloud
neamc deploy --target ecs-fargate --output ./deploy/
neamc deploy --target cloud-run --output ./deploy/
neamc deploy --target azure-container-apps --output ./deploy/

19.8 Environment-Specific Overrides #

Real-world projects need different configurations for development, staging, and production. Neam supports this through three complementary mechanisms.

Mechanism 1: Environment Variables #

The most common approach. Your neam.toml defines sane defaults, and environment variables override specific values per environment:

toml

# neam.toml -- shared defaults
[state]
backend = "postgres"
connection-string = "postgresql://neam:dev@localhost:5432/neam_dev"

[llm]
default-provider = "openai"
default-model = "gpt-4o-mini"

[llm.cost]
daily-budget-usd = 10.0

[telemetry]
enabled = false

bash

# development.env
NEAM_STATE_CONNECTION_STRING=postgresql://neam:dev@localhost:5432/neam_dev
NEAM_LLM_COST_DAILY_BUDGET_USD=10.0
NEAM_TELEMETRY_ENABLED=false

# staging.env
NEAM_STATE_CONNECTION_STRING=postgresql://neam:staging@staging-db:5432/neam_staging
NEAM_LLM_COST_DAILY_BUDGET_USD=50.0
NEAM_TELEMETRY_ENABLED=true
NEAM_OTEL_ENDPOINT=http://otel-collector.staging:4318
NEAM_TELEMETRY_SAMPLING_RATE=1.0

# production.env
NEAM_STATE_CONNECTION_STRING=postgresql://neam:$DB_PASS@prod-db:5432/neam_prod
NEAM_LLM_COST_DAILY_BUDGET_USD=500.0
NEAM_TELEMETRY_ENABLED=true
NEAM_OTEL_ENDPOINT=http://otel-collector.production:4318
NEAM_TELEMETRY_SAMPLING_RATE=0.1

Mechanism 2: Kubernetes ConfigMaps and Secrets #

When deploying to Kubernetes, environment-specific values live in ConfigMaps and Secrets. Kustomize overlays (covered in Chapter 20) patch these per environment:

yaml

# gitops/base/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: neam-config
data:
  NEAM_ENV: "production"
  NEAM_LOG_LEVEL: "info"
  NEAM_PORT: "8080"
  NEAM_TELEMETRY_ENABLED: "true"
  NEAM_OTEL_ENDPOINT: "http://otel-collector.observability.svc.cluster.local:4318"

Mechanism 3: Cloud-Specific Secrets #

For sensitive values, use the secrets provider appropriate to your cloud:

toml

# AWS deployment
[secrets]
provider = "aws-secrets-manager"
region = "us-east-1"
prefix = "production/neam/"

# GCP deployment
[secrets]
provider = "gcp-secret-manager"
project = "neam-production"

# Azure deployment
[secrets]
provider = "azure-key-vault"
vault-url = "https://neam-prod.vault.azure.net"

A Complete Multi-Environment Setup #

Here is a practical example showing how a team might structure configuration for a customer service agent across three environments:

toml

# neam.toml -- committed to git
neam_version = "1.0"

[project]
name = "customer-service"
version = "2.1.0"
type = "binary"

[project.entry_points]
main = "src/main.neam"

# Default state config (overridden per environment)
[state]
backend = "postgres"
connection-string = "postgresql://neam:dev@localhost:5432/neam_dev"

# LLM gateway (same structure across environments)
[llm]
default-provider = "openai"
default-model = "gpt-4o-mini"

[llm.rate-limits.openai]
requests-per-minute = 120

[llm.circuit-breaker]
failure-threshold = 3
reset-timeout-seconds = 60

[llm.cache]
enabled = true
max-entries = 1000
ttl-seconds = 600

[llm.cost]
daily-budget-usd = 10.0

# Telemetry off by default (enabled in staging/production)
[telemetry]
enabled = false

# Secrets from environment variables (overridden in cloud environments)
[secrets]
provider = "env"

# Deploy target
[deploy]
target = "kubernetes"

[deploy.kubernetes]
namespace = "neam-dev"
replicas = 1

The corresponding Neam source file remains identical across all environments:

neam

agent TriageAgent {
  provider: "openai"
  model: "gpt-4o-mini"
  temperature: 0.3
  system: "You are a triage agent. Route customer requests."
  handoffs: [RefundAgent, BillingAgent]

  learning: {
    strategy: "experience_replay"
    review_interval: 20
  }

  memory: "triage_memory"
}

agent RefundAgent {
  provider: "openai"
  model: "gpt-4o-mini"
  system: "You process refund requests professionally."
  memory: "refund_memory"
}

agent BillingAgent {
  provider: "openai"
  model: "gpt-4o-mini"
  system: "You handle billing questions."
  memory: "billing_memory"
}

{
  let query = input();
  let triage = TriageAgent.ask(query);

  if (triage.contains("ROUTE:REFUND")) {
    emit RefundAgent.ask(query);
  }
  if (triage.contains("ROUTE:BILLING")) {
    emit BillingAgent.ask(query);
  }
}

19.9 Configuration Validation #

The neamc compiler validates neam.toml at compile time and reports errors before any deployment happens:

bash

$ neamc deploy --target kubernetes --dry-run

Validating neam.toml...
  [state]         backend: postgres           OK
  [llm]           default-provider: openai    OK
  [llm.cache]     enabled: true               OK
  [telemetry]     enabled: true               OK
  [telemetry]     endpoint: (not set)         ERROR: endpoint required when enabled
  [secrets]       provider: env               OK
  [deploy]        target: kubernetes          OK

Error: [telemetry] endpoint must be set when telemetry is enabled.

Common validation rules:

[state] requires connection-string for all backends except sqlite
[telemetry] requires endpoint when enabled = true
[llm.rate-limits.*] must have requests-per-minute > 0
[llm.circuit-breaker] must have failure-threshold >= 1
[deploy.kubernetes] requires namespace when target is kubernetes

19.10 Compile Flags and Backend Dependencies #

Not all backends are included in every build. Heavy cloud SDK dependencies are gated behind CMake compile flags:

Compile Flag	Enables	Dependencies
(default)	SQLite backend	None (bundled)
`-DNEAM_BACKEND_POSTGRES=ON`	PostgreSQL backend	`libpq`
`-DNEAM_BACKEND_REDIS=ON`	Redis backend	`hiredis`
`-DNEAM_BACKEND_AWS=ON`	DynamoDB, Bedrock, Lambda, ECS	`libcurl` (bundled)
`-DNEAM_BACKEND_GCP=ON`	Firestore, Cloud Run, Vertex AI	`libcurl` (bundled)
`-DNEAM_BACKEND_AZURE=ON`	CosmosDB, Azure OpenAI, AKS, Container Apps	`libcurl` (bundled)

The default Docker image includes PostgreSQL support:

bash

cmake -B build \
  -DCMAKE_BUILD_TYPE=Release \
  -DNEAM_BACKEND_POSTGRES=ON

For a full multi-cloud build:

bash

cmake -B build \
  -DCMAKE_BUILD_TYPE=Release \
  -DNEAM_BACKEND_POSTGRES=ON \
  -DNEAM_BACKEND_REDIS=ON \
  -DNEAM_BACKEND_AWS=ON \
  -DNEAM_BACKEND_GCP=ON \
  -DNEAM_BACKEND_AZURE=ON

If you specify a backend in neam.toml that was not compiled in, the VM reports a clear error:

text

Error: State backend "dynamodb" is not available.
Build with -DNEAM_BACKEND_AWS=ON to enable DynamoDB support.

19.11 Agent Limits in neam.toml #

The [agent] section configures default behavior for all agents in the project. These values apply unless overridden in individual agent declarations:

toml

[agent]
provider = "openai"
model = "gpt-4o-mini"
capabilities = ["text", "tools", "vision"]

[agent.limits]
max-tokens-per-request = 4096
max-concurrent-tools = 5
timeout-seconds = 30
max-retries = 3

[agent.prompts]
system = "You are a helpful assistant."
error-recovery = "An error occurred. Please try again."

Agent Limits Reference #

Field	Type	Default	Description
`max-tokens-per-request`	int	`4096`	Maximum tokens per LLM request
`max-concurrent-tools`	int	`5`	Maximum parallel tool executions
`timeout-seconds`	int	`30`	Timeout for a single agent call
`max-retries`	int	`3`	Retry count on transient failures

These limits act as safety defaults. An agent that exceeds max-tokens-per-request will have its response truncated. An agent that exceeds timeout-seconds will return an error. These protect against runaway LLM calls in production.

19.12 Feature Flags and Build Profiles #

Feature Flags #

The [features] section enables conditional compilation of capabilities. This lets you build different variants of the same project for different deployment targets:

toml

[features]
default = ["basic-agents"]
basic-agents = []
advanced-rag = ["basic-agents"]
multi-provider = ["basic-agents"]
full = ["advanced-rag", "multi-provider"]

Features control which modules and capabilities are included in the compiled output. The default key specifies which features are active when none are explicitly selected.

Build Profiles #

The [profile.*] sections define compilation settings for different build targets:

toml

[profile.dev]
optimization = "none"
debug = true
source-maps = true

[profile.release]
optimization = "full"
debug = false
strip = true

[profile.bench]
optimization = "full"
debug = true

Field	Type	Description
`optimization`	string	`"none"`, `"basic"`, or `"full"`
`debug`	bool	Include debug symbols
`source-maps`	bool	Generate source maps for stack traces
`strip`	bool	Strip symbols from binary (smaller output)

Use neamc --profile release to compile with a specific profile:

bash

# Development build (default)
neamc src/main.neam -o main.neamb

# Release build (optimized, stripped)
neamc --profile release src/main.neam -o main.neamb

19.13 Secret URI Syntax #

The secrets provider supports a URI-based syntax for referencing secrets from different backends. When the VM encounters a secret:// prefix in a configuration value, it resolves the secret through the configured provider.

The full URI syntax supports provider-specific prefixes:

URI Prefix	Provider	Example
`secret://NAME`	Configured provider	`secret://DATABASE_URL`
`env://NAME`	Environment variable	`env://OPENAI_API_KEY`
`aws-sm://NAME`	AWS Secrets Manager	`aws-sm://production/db-password`
`azure-kv://VAULT/NAME`	Azure Key Vault	`azure-kv://my-vault/api-key`
`gcp-sm://PROJECT/NAME`	GCP Secret Manager	`gcp-sm://my-project/api-key`
`vault://PATH#KEY`	HashiCorp Vault	`vault://secret/data/neam#api_key`
`k8s://NS/SECRET/KEY`	Kubernetes Secret	`k8s://neam-prod/neam-secrets/api-key`

The secret:// prefix uses whatever provider is configured in the [secrets] section. The provider-specific prefixes bypass the configured provider and go directly to the specified backend. This lets you mix sources:

toml

[secrets]
provider = "aws-secrets-manager"
region = "us-east-1"
prefix = "production/"

[state]
connection-string = "secret://DATABASE_URL"            # Uses AWS Secrets Manager

[llm]
# Override: read this specific key from Vault instead
anthropic-key = "vault://secret/data/llm#anthropic"

The composite secrets resolver chains multiple providers with fallback behavior. If the primary provider cannot resolve a secret, it tries the next provider in the chain. This is configured automatically when you use provider-specific URI prefixes alongside a [secrets] provider.

19.14 NeamClaw Configuration #

The [neamclaw] section configures defaults for claw agents and forge agents. These values apply unless overridden in individual agent declarations.

Claw Agent Defaults #

toml

[neamclaw.claw]
# Session defaults
idle-reset-minutes = 60
daily-reset-hour = 4
max-history-turns = 100
compaction = "auto"

# Context limits
max-context-tokens = 128000

# Lane defaults
default-lane-concurrency = 1

# Memory defaults
semantic-memory-backend = "sqlite"
semantic-memory-search = "hybrid"

Forge Agent Defaults #

toml

[neamclaw.forge]
# Loop defaults
max-iterations = 25
max-cost = 10.0
max-tokens = 500000

# Checkpoint defaults
checkpoint = "git"

# Workspace defaults
workspace-base = "./workspace"

Channel Configuration #

toml

[neamclaw.channels]
# Global channel settings
default-dm-policy = "open"
default-group-policy = "mention_only"

# HTTP channel defaults
[neamclaw.channels.http]
port = 8080
auth = "bearer"
auth-env = "NEAM_API_KEY"

Memory Configuration #

toml

[neamclaw.memory]
# Chunking parameters for semantic indexing
chunk-size = 400
chunk-overlap = 80

# Hybrid search weights
vector-weight = 0.7
bm25-weight = 0.3

# Default top_k for memory_search()
default-top-k = 5

NeamClaw Configuration Reference #

Key	Type	Default	Description
`neamclaw.claw.idle-reset-minutes`	int	`60`	Session idle reset timeout
`neamclaw.claw.daily-reset-hour`	int	`4`	Daily session reset hour (0-23)
`neamclaw.claw.max-history-turns`	int	`100`	Maximum turns to retain
`neamclaw.claw.compaction`	string	`"auto"`	`"auto"`, `"manual"`, `"disabled"`
`neamclaw.claw.max-context-tokens`	int	`128000`	Context token budget
`neamclaw.forge.max-iterations`	int	`25`	Default max loop iterations
`neamclaw.forge.max-cost`	float	`10.0`	Default max cost in USD
`neamclaw.forge.max-tokens`	int	`500000`	Default max token budget
`neamclaw.forge.checkpoint`	string	`"git"`	Default checkpoint strategy
`neamclaw.memory.chunk-size`	int	`400`	Characters per memory chunk
`neamclaw.memory.chunk-overlap`	int	`80`	Overlap between chunks
`neamclaw.memory.vector-weight`	float	`0.7`	Hybrid search vector weight
`neamclaw.memory.bm25-weight`	float	`0.3`	Hybrid search BM25 weight
`neamclaw.memory.default-top-k`	int	`5`	Default results for memory_search()

💡 Tip

These defaults simplify agent declarations. If all your claw agents use the same session configuration, set it once in neam.toml and omit the session block from individual agent declarations.

19.15 Putting It All Together #

Here is a complete, production-ready neam.toml for a customer service agent deployed to Kubernetes on AWS with PostgreSQL state, OpenAI as the primary LLM with Anthropic failover, full observability, and AWS Secrets Manager for credentials:

toml

neam_version = "1.0"

[project]
name = "customer-service-agent"
version = "2.1.0"
description = "Multi-agent customer service system with triage and routing"
type = "binary"
authors = ["Platform Team <platform@company.com>"]

[project.entry_points]
main = "src/main.neam"

[dependencies]
agent-utils = "^1.0.0"

[state]
backend = "postgres"
connection-string = "secret://DATABASE_URL"
ttl-seconds = 86400

[llm]
default-provider = "openai"
default-model = "gpt-4o-mini"

[llm.rate-limits.openai]
requests-per-minute = 500

[llm.rate-limits.anthropic]
requests-per-minute = 200

[llm.circuit-breaker]
failure-threshold = 5
reset-timeout-seconds = 30
half-open-max-requests = 2

[llm.cache]
enabled = true
max-entries = 5000
ttl-seconds = 300

[llm.cost]
daily-budget-usd = 500.0

[llm.fallback-chain]
providers = ["openai", "anthropic"]

[telemetry]
enabled = true
endpoint = "http://otel-collector.observability.svc.cluster.local:4318"
service-name = "customer-service-agent"
sampling-rate = 0.1

[secrets]
provider = "aws-secrets-manager"
region = "us-east-1"
prefix = "production/customer-service/"

[deploy]
target = "kubernetes"

[deploy.kubernetes]
namespace = "customer-service"
replicas = 3

[deploy.kubernetes.resources]
cpu-request = "1"
cpu-limit = "2"
memory-request = "512Mi"
memory-limit = "1Gi"

[deploy.kubernetes.scaling]
min-replicas = 3
max-replicas = 20
target-cpu-percent = 70

[deploy.kubernetes.disruption]
min-available = 2

Summary #

In this chapter, you learned:

The neam.toml file is the single source of truth for cloud-native configuration
The configuration hierarchy: environment variables > neam.toml > CLI flags > defaults
Six state backends (SQLite, PostgreSQL, Redis, DynamoDB, CosmosDB, Firestore) and when to use each
LLM gateway configuration: rate limits, circuit breaking, caching, cost tracking, and failover
Telemetry configuration for OpenTelemetry export with configurable sampling
Secrets provider configuration for cloud-native credential management
Deploy target configuration for generating platform-specific artifacts
Environment-specific overrides using environment variables and Kubernetes ConfigMaps
Compile flags that gate cloud-specific dependencies
Agent limits in neam.toml: token caps, concurrent tool limits, timeouts, and retries
Feature flags for conditional compilation and build profiles (dev, release, bench)
Secret URI syntax with provider-specific prefixes (aws-sm://, azure-kv://, gcp-sm://, vault://, k8s://) and composite resolver fallback chains

In the next chapter, we will use this configuration to build Docker images and deploy to Kubernetes.

Exercises #

Exercise 19.1: Basic Configuration #

Create a neam.toml for a Neam project called research-assistant that:

Uses PostgreSQL as the state backend with a local development connection string
Sets OpenAI as the default provider with gpt-4o as the default model
Limits OpenAI to 60 requests per minute
Enables response caching with a 10-minute TTL
Disables telemetry

Exercise 19.2: Circuit Breaker Analysis #

Given the following circuit breaker configuration:

toml

[llm.circuit-breaker]
failure-threshold = 3
reset-timeout-seconds = 120
half-open-max-requests = 1

Answer the following questions:

How many consecutive failures trigger the circuit to open?
Once open, how long until the circuit transitions to half-open?
If the half-open probe request fails, what happens?
If the half-open probe succeeds, what happens?
Draw the state transitions that occur if: success, success, failure, failure, failure, (wait 120s), failure, (wait 120s), success.

Exercise 19.3: Multi-Environment Configuration #

Design a configuration strategy for a Neam agent that runs in three environments:

Development: SQLite state, no rate limits, no telemetry, $5/day budget
Staging: PostgreSQL state, OpenAI at 60 RPM, telemetry at 100% sampling, $50/day
Production: PostgreSQL state, OpenAI at 500 RPM with Anthropic failover, telemetry at 10% sampling, $500/day

Write the neam.toml file and the three .env files (dev.env, staging.env, production.env) that implement this strategy.

Exercise 19.4: Secrets Migration #

You have a project that uses environment variables for secrets:

bash

export OPENAI_API_KEY="sk-abc123"
export DATABASE_URL="postgresql://user:plaintext-pass@db:5432/neam"

Migrate this project to use AWS Secrets Manager. Write the updated [secrets] section of neam.toml and describe the steps needed to store the secrets in AWS Secrets Manager using the AWS CLI.

Exercise 19.5: Cost Tracking Design #

A team runs 10 agents that collectively make approximately 10,000 LLM calls per day. The average cost per call is $0.002 for gpt-4o-mini.

What should daily-budget-usd be set to, with a 50% safety margin?
If one agent has a bug that causes an infinite loop of LLM calls, how quickly would the daily budget be exhausted at 120 requests per minute?
What additional safeguards (beyond the gateway cost tracking) would you implement? Consider agent-level budgets, guardrails, and circuit breakers.

Exercise 19.6: Backend Selection Decision #

For each of the following deployment scenarios, recommend a state backend and justify your choice:

A prototype agent running on a developer's laptop
A production agent on AWS ECS Fargate that needs sub-10ms state reads
A multi-region deployment on Azure that must survive a region failure
A high-throughput event processing agent handling 50,000 interactions per hour
A cost-sensitive startup deploying to a single Kubernetes cluster