Chapter 21: Multi-Cloud Deployment #
"The cloud is not a place. It is a way of doing IT. And if you are going to do IT that way, you should not be locked into doing it in only one place." -- Cloud architecture principle
What You Will Learn #
In this chapter, you will learn how to deploy Neam agents to the three major cloud providers: Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure. You will deploy to serverless platforms (Lambda, Cloud Run, Container Apps), container orchestrators (ECS Fargate, AKS), and managed AI services (Bedrock, Vertex AI, Azure OpenAI). You will configure cloud-native state backends (DynamoDB, CosmosDB) and secrets managers for each provider. By the end of this chapter, you will understand how to choose the right deployment target for your workload and how to run the same Neam agent across multiple clouds.
21.1 Multi-Cloud Architecture Overview #
Neam v0.6.0 supports eight deployment targets across three cloud providers plus Kubernetes:
The Neam source code is identical across all targets. Only neam.toml and compile
flags change.
Compile Flags #
Each cloud provider's integrations are gated behind compile flags to avoid pulling in unnecessary dependencies:
| Flag | Enables | Required Libraries |
|---|---|---|
-DNEAM_BACKEND_POSTGRES=ON |
PostgreSQL state backend | libpq |
-DNEAM_BACKEND_REDIS=ON |
Redis state backend | hiredis |
-DNEAM_BACKEND_AWS=ON |
DynamoDB, Bedrock, Lambda, ECS Fargate | libcurl (bundled) |
-DNEAM_BACKEND_GCP=ON |
Cloud Run, Vertex AI, GCP Secret Manager | libcurl (bundled) |
-DNEAM_BACKEND_AZURE=ON |
CosmosDB, Azure OpenAI, Container Apps, AKS, Key Vault | libcurl (bundled) |
The AWS, GCP, and Azure backends use custom REST clients built on libcurl (which is already bundled with Neam), not full cloud SDKs. This keeps the binary small and avoids SDK version conflicts.
# Build for AWS deployment
cmake -B build \
-DCMAKE_BUILD_TYPE=Release \
-DNEAM_BACKEND_POSTGRES=ON \
-DNEAM_BACKEND_AWS=ON
# Build for all clouds
cmake -B build \
-DCMAKE_BUILD_TYPE=Release \
-DNEAM_BACKEND_POSTGRES=ON \
-DNEAM_BACKEND_AWS=ON \
-DNEAM_BACKEND_GCP=ON \
-DNEAM_BACKEND_AZURE=ON
21.2 AWS Deployment #
AWS Lambda #
AWS Lambda is ideal for event-driven agents that handle sporadic traffic. The agent starts on demand, processes a request, and shuts down -- you pay only for compute time used.
Generating Lambda Artifacts #
neamc deploy --target lambda --dry-run
This generates a SAM (Serverless Application Model) template:
# Generated: template.yaml
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Neam Agent - Lambda Deployment
Globals:
Function:
Timeout: 30
MemorySize: 512
Runtime: provided.al2023
Architectures:
- arm64
Resources:
NeamAgentFunction:
Type: AWS::Serverless::Function
Properties:
Handler: bootstrap
CodeUri: ./build/
Description: Neam agent function
Policies:
- SecretsManagerReadWrite
- DynamoDBCrudPolicy:
TableName: !Ref StateTable
Environment:
Variables:
NEAM_ENV: production
NEAM_STATE_BACKEND: dynamodb
NEAM_STATE_CONNECTION_STRING: !Sub "dynamodb://${AWS::Region}/${StateTable}"
NEAM_SECRETS_PROVIDER: aws-secrets-manager
Events:
ApiEvent:
Type: Api
Properties:
Path: /api/v1/agent/ask
Method: post
HealthEvent:
Type: Api
Properties:
Path: /health
Method: get
StateTable:
Type: AWS::DynamoDB::Table
Properties:
TableName: neam-state
AttributeDefinitions:
- AttributeName: PK
AttributeType: S
- AttributeName: SK
AttributeType: S
KeySchema:
- AttributeName: PK
KeyType: HASH
- AttributeName: SK
KeyType: RANGE
BillingMode: PAY_PER_REQUEST
TimeToLiveSpecification:
AttributeName: ttl
Enabled: true
Outputs:
ApiUrl:
Description: API Gateway URL
Value: !Sub "https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/"
Deploying to Lambda #
# Build the agent
neamc src/main.neam -o build/main.neamb
# Generate the SAM template
neamc deploy --target lambda --output ./deploy/
# Deploy using SAM CLI
cd deploy
sam build
sam deploy --guided \
--stack-name neam-agent \
--capabilities CAPABILITY_IAM \
--parameter-overrides \
OpenAIApiKeySecret=production/neam/OPENAI_API_KEY
Lambda Configuration in neam.toml #
[deploy]
target = "lambda"
[deploy.lambda]
memory-mb = 512
timeout-seconds = 30
architecture = "arm64" # Graviton (cheaper)
vpc-enabled = false
[state]
backend = "dynamodb"
connection-string = "dynamodb://us-east-1/neam-state"
[secrets]
provider = "aws-secrets-manager"
region = "us-east-1"
prefix = "production/neam/"
AWS ECS Fargate #
ECS Fargate is the right choice when your agent needs to be always-on (for autonomous agents with scheduled triggers) or when it needs persistent connections (WebSocket, SSE streaming).
Generating ECS Artifacts #
neamc deploy --target ecs-fargate --dry-run
This generates a task definition, ECS service configuration, and deployment script:
// Generated: task-definition.json
{
"family": "neam-agent",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "512",
"memory": "1024",
"executionRoleArn": "arn:aws:iam::ACCOUNT:role/neam-execution-role",
"taskRoleArn": "arn:aws:iam::ACCOUNT:role/neam-task-role",
"containerDefinitions": [
{
"name": "neam-agent",
"image": "ACCOUNT.dkr.ecr.REGION.amazonaws.com/neam-agent:latest",
"portMappings": [
{
"containerPort": 8080,
"protocol": "tcp"
}
],
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 15
},
"environment": [
{"name": "NEAM_ENV", "value": "production"},
{"name": "NEAM_PORT", "value": "8080"},
{"name": "NEAM_STATE_BACKEND", "value": "postgres"}
],
"secrets": [
{
"name": "OPENAI_API_KEY",
"valueFrom": "arn:aws:secretsmanager:REGION:ACCOUNT:secret:neam/OPENAI_API_KEY"
},
{
"name": "DATABASE_URL",
"valueFrom": "arn:aws:secretsmanager:REGION:ACCOUNT:secret:neam/DATABASE_URL"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/neam-agent",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
}
}
]
}
Deploying to ECS #
# Build and push Docker image
docker build -t neam-agent:latest .
docker tag neam-agent:latest ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/neam-agent:latest
aws ecr get-login-password | docker login --username AWS --password-stdin \
ACCOUNT.dkr.ecr.us-east-1.amazonaws.com
docker push ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/neam-agent:latest
# Generate and deploy ECS artifacts
neamc deploy --target ecs-fargate --output ./deploy/
# Register the task definition
aws ecs register-task-definition --cli-input-json file://deploy/task-definition.json
# Create or update the service
aws ecs update-service \
--cluster neam-cluster \
--service neam-agent \
--task-definition neam-agent \
--force-new-deployment
AWS Bedrock Integration #
AWS Bedrock provides managed access to foundation models from Anthropic, Meta, Amazon, and others. Using Bedrock means your LLM calls stay within the AWS network, which can reduce latency and simplify compliance.
agent BedrockAgent {
provider: "bedrock"
model: "anthropic.claude-3-5-sonnet-20241022-v2:0"
system: "You are a helpful assistant powered by AWS Bedrock."
}
{
let response = BedrockAgent.ask("Explain serverless architecture.");
emit response;
}
Bedrock authentication uses AWS IAM credentials -- no API key needed. The Neam VM signs requests with AWS Signature V4, using the credentials from the environment (environment variables, instance profile, or ECS task role).
# neam.toml for Bedrock
[llm]
default-provider = "bedrock"
default-model = "anthropic.claude-3-5-sonnet-20241022-v2:0"
[llm.rate-limits.bedrock]
requests-per-minute = 60
Supported Bedrock model families:
| Model Family | Model ID Pattern | Notes |
|---|---|---|
| Anthropic Claude | anthropic.claude-* |
Messages API format |
| Meta Llama | meta.llama3-* |
Llama request format |
| Amazon Titan | amazon.titan-* |
Amazon Titan format |
DynamoDB State Backend #
DynamoDB is a natural fit for serverless (Lambda) deployments because it scales to zero with your compute:
[state]
backend = "dynamodb"
connection-string = "dynamodb://us-east-1/neam-state"
The single-table design uses composite partition and sort keys:
DynamoDB distributed locking uses conditional writes:
PutItem(
PK = "LOCK#autonomous:leader",
SK = "LOCK",
holder_id = instance_id,
expires_at = now + ttl,
ConditionExpression = "attribute_not_exists(PK) OR expires_at <= :now"
)
AWS Secrets Manager #
[secrets]
provider = "aws-secrets-manager"
region = "us-east-1"
prefix = "production/neam/"
# Store secrets
aws secretsmanager create-secret \
--name "production/neam/OPENAI_API_KEY" \
--secret-string "sk-..."
aws secretsmanager create-secret \
--name "production/neam/DATABASE_URL" \
--secret-string "postgresql://user:pass@host:5432/neam"
21.3 GCP Deployment #
Google Cloud Run #
Cloud Run is GCP's serverless container platform. It automatically scales from zero to many instances and charges per request. Unlike Lambda, Cloud Run runs your actual Docker container, so everything that works in Docker works on Cloud Run.
Generating Cloud Run Artifacts #
neamc deploy --target cloud-run --dry-run
This generates a Cloud Run service YAML:
# Generated: cloud-run-service.yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: neam-agent
labels:
cloud.googleapis.com/location: us-central1
annotations:
run.googleapis.com/ingress: all
run.googleapis.com/cpu-throttling: "false"
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/minScale: "0"
autoscaling.knative.dev/maxScale: "10"
run.googleapis.com/cpu-throttling: "false"
run.googleapis.com/startup-cpu-boost: "true"
spec:
serviceAccountName: neam-agent@PROJECT.iam.gserviceaccount.com
containerConcurrency: 80
timeoutSeconds: 300
containers:
- name: neam-agent
image: gcr.io/PROJECT/neam-agent:latest
ports:
- containerPort: 8080
resources:
limits:
cpu: "1"
memory: 1Gi
env:
- name: NEAM_ENV
value: production
- name: NEAM_PORT
value: "8080"
- name: NEAM_STATE_BACKEND
value: postgres
- name: DATABASE_URL
valueFrom:
secretKeyRef:
key: latest
name: neam-database-url
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
key: latest
name: neam-openai-key
startupProbe:
httpGet:
path: /startup
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 30
livenessProbe:
httpGet:
path: /health
port: 8080
periodSeconds: 20
Key Cloud Run annotations:
cpu-throttling: "false": Keeps CPU allocated even between requests. Essential for autonomous agents that run scheduled tasks.startup-cpu-boost: "true": Allocates extra CPU during cold starts for faster initialization.minScale: "0": Scales to zero when idle (cost saving).containerConcurrency: 80: Maximum concurrent requests per instance.
Deploying to Cloud Run #
# Build and push image
docker build -t gcr.io/my-project/neam-agent:latest .
docker push gcr.io/my-project/neam-agent:latest
# Generate Cloud Run manifest
neamc deploy --target cloud-run --output ./deploy/
# Deploy using gcloud
gcloud run services replace deploy/cloud-run-service.yaml \
--region us-central1
# Or deploy directly
gcloud run deploy neam-agent \
--image gcr.io/my-project/neam-agent:latest \
--region us-central1 \
--memory 1Gi \
--cpu 1 \
--min-instances 0 \
--max-instances 10 \
--no-cpu-throttling \
--set-secrets "OPENAI_API_KEY=neam-openai-key:latest" \
--set-secrets "DATABASE_URL=neam-database-url:latest"
Vertex AI Integration #
Vertex AI provides managed access to Google's Gemini models and third-party models. Using Vertex AI instead of the public Gemini API gives you VPC service controls, customer-managed encryption keys, and data residency guarantees.
agent VertexAgent {
provider: "vertex"
model: "gemini-2.0-flash"
system: "You are an assistant powered by Vertex AI."
}
{
let response = VertexAgent.ask("Explain Cloud Run autoscaling.");
emit response;
}
Vertex AI authentication uses Google Application Default Credentials (ADC). The Neam
VM reads the service account key from GOOGLE_APPLICATION_CREDENTIALS, exchanges it
for an OAuth2 access token, and caches the token until it expires (1 hour).
# neam.toml for Vertex AI
[llm]
default-provider = "vertex"
default-model = "gemini-2.0-flash"
[llm.rate-limits.vertex]
requests-per-minute = 60
GCP Secret Manager #
[secrets]
provider = "gcp-secret-manager"
project = "my-gcp-project"
# Store secrets
echo -n "sk-..." | gcloud secrets create neam-openai-key --data-file=-
echo -n "postgresql://..." | gcloud secrets create neam-database-url --data-file=-
# Grant access to the service account
gcloud secrets add-iam-policy-binding neam-openai-key \
--member="serviceAccount:neam-agent@my-project.iam.gserviceaccount.com" \
--role="roles/secretmanager.secretAccessor"
Firestore State Backend #
Firestore is GCP's serverless document database. Like DynamoDB for AWS, it pairs naturally with Cloud Run because both scale to zero:
[state]
backend = "firestore"
connection-string = "firestore://my-gcp-project/neam-state"
Firestore uses a collection-per-type design:
Firestore distributed locking uses transactions with optimistic concurrency:
transaction {
doc = get("locks/autonomous:leader")
if doc.exists AND doc.expires_at > now:
abort // Lock held by another instance
set("locks/autonomous:leader", {
holder_id: instance_id,
expires_at: now + ttl
})
}
Authentication uses the same Application Default Credentials as Vertex AI. No additional configuration is needed when running on Cloud Run — the service account is inherited from the Cloud Run service configuration.
21.4 Azure Deployment #
Azure Container Apps #
Azure Container Apps is a serverless container platform built on Kubernetes and KEDA. It handles autoscaling, load balancing, and health management automatically.
Generating Container Apps Artifacts #
neamc deploy --target azure-container-apps --dry-run
# Generated: azure-container-app.yaml
properties:
managedEnvironmentId: /subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.App/managedEnvironments/{env}
configuration:
ingress:
external: true
targetPort: 8080
transport: http
secrets:
- name: openai-key
keyVaultUrl: https://neam-vault.vault.azure.net/secrets/openai-key
identity: system
- name: database-url
keyVaultUrl: https://neam-vault.vault.azure.net/secrets/database-url
identity: system
template:
containers:
- name: neam-agent
image: neamacr.azurecr.io/neam-agent:latest
resources:
cpu: 0.5
memory: 1Gi
env:
- name: NEAM_ENV
value: production
- name: NEAM_PORT
value: "8080"
- name: OPENAI_API_KEY
secretRef: openai-key
- name: DATABASE_URL
secretRef: database-url
probes:
- type: Liveness
httpGet:
path: /health
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
- type: Readiness
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
- type: Startup
httpGet:
path: /startup
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 30
scale:
minReplicas: 0
maxReplicas: 10
rules:
- name: http-rule
http:
metadata:
concurrentRequests: "50"
Deploying to Container Apps #
# Build and push to Azure Container Registry
az acr build --registry neamacr --image neam-agent:latest .
# Generate artifacts
neamc deploy --target azure-container-apps --output ./deploy/
# Deploy
az containerapp create \
--name neam-agent \
--resource-group neam-rg \
--environment neam-env \
--image neamacr.azurecr.io/neam-agent:latest \
--target-port 8080 \
--ingress external \
--cpu 0.5 \
--memory 1.0Gi \
--min-replicas 0 \
--max-replicas 10 \
--secrets "openai-key=keyvaultref:https://neam-vault.vault.azure.net/secrets/openai-key,identityref:system" \
--env-vars "NEAM_ENV=production" "OPENAI_API_KEY=secretref:openai-key"
Azure Kubernetes Service (AKS) #
AKS is the right choice when you need the full power of Kubernetes with Azure
integration. The neamc deploy --target azure-aks command generates standard
Kubernetes manifests enhanced with AKS-specific annotations.
neamc deploy --target azure-aks --dry-run
The generated manifests include:
- Workload Identity annotations for pod-level Azure AD authentication
- Azure Key Vault SecretProviderClass for mounting secrets from Key Vault
- Azure Load Balancer annotations for the Service
- Azure Disk/Files StorageClass references for persistent volumes
# Additional AKS annotations on the Deployment
metadata:
annotations:
azure.workload.identity/use: "true"
spec:
template:
metadata:
labels:
azure.workload.identity/use: "true"
spec:
serviceAccountName: neam-agent
volumes:
- name: secrets-store
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: neam-secrets
Deploying to AKS #
# Generate AKS manifests
neamc deploy --target azure-aks --output ./deploy/
# Connect to the AKS cluster
az aks get-credentials --resource-group neam-rg --name neam-aks
# Apply manifests
kubectl apply -f deploy/
# Verify
kubectl -n neam-production get pods
Azure OpenAI Integration #
Azure OpenAI provides OpenAI models (GPT-4, GPT-4o) hosted in Azure data centers. This gives you the same models as OpenAI but with Azure's compliance certifications, VNet integration, and managed identity authentication.
agent AzureAgent {
provider: "azure_openai"
model: "gpt-4o-mini"
endpoint: "https://neam-openai.openai.azure.com"
system: "You are an assistant powered by Azure OpenAI."
}
{
let response = AzureAgent.ask("Explain Azure Container Apps.");
emit response;
}
The Azure OpenAI adapter uses the same request/response format as the OpenAI adapter but with different authentication (API key header or Azure AD token) and endpoint URL construction:
https://{resource}.openai.azure.com/openai/deployments/{deployment}/chat/completions?api-version=2024-10-21
# neam.toml for Azure OpenAI
[llm]
default-provider = "azure_openai"
default-model = "gpt-4o-mini"
[llm.rate-limits.azure_openai]
requests-per-minute = 120
CosmosDB State Backend #
CosmosDB is Azure's globally distributed database. It offers single-digit millisecond reads, multi-region writes, and multiple consistency models:
[state]
backend = "cosmosdb"
connection-string = "cosmosdb://neamaccount.documents.azure.com/neam-db"
CosmosDB uses the SQL API with a single container partitioned by agent_name. Each
document includes a type field for multiplexing:
{
"id": "uuid-123",
"agent_name": "TriageAgent",
"type": "learning_interaction",
"query": "Help with my order",
"response": "I'll route you to...",
"reflection_score": 0.85,
"timestamp": 1706745600
}
Azure Key Vault #
[secrets]
provider = "azure-key-vault"
vault-url = "https://neam-vault.vault.azure.net"
# Store secrets
az keyvault secret set --vault-name neam-vault \
--name openai-key --value "sk-..."
az keyvault secret set --vault-name neam-vault \
--name database-url --value "postgresql://..."
# Grant access to the managed identity
az keyvault set-policy --name neam-vault \
--object-id $(az identity show --name neam-identity -g neam-rg --query principalId -o tsv) \
--secret-permissions get list
21.5 Multi-Cloud Strategy #
When to Use Each Target #
+---------------------------------------------------------------+
| |
| Start |
| | |
| v |
| Is traffic sporadic/event-driven? |
| | |
| +-- Yes --> Is it on AWS? |
| | +-- Yes --> Lambda |
| | +-- No --> Cloud Run (GCP) |
| | Container Apps (Azure) |
| | |
| +-- No --> Does it need always-on |
| (autonomous agents, WebSocket)? |
| | |
| +-- Yes --> Need Kubernetes features? |
| | +-- Yes --> K8s / AKS / GKE |
| | +-- No --> ECS Fargate (AWS) |
| | Cloud Run (GCP, no |
| | CPU throttling) |
| | Container Apps (Azure) |
| | |
| +-- No --> Cloud Run / Container Apps |
| (scale to zero, save cost) |
| |
+---------------------------------------------------------------+
Target Comparison #
| Concern | Lambda | ECS Fargate | Cloud Run | Container Apps | Kubernetes |
|---|---|---|---|---|---|
| Cold start | 1-5s | None | 1-3s | 1-3s | None |
| Scale to zero | Yes | No | Yes | Yes | With KEDA |
| Max timeout | 15 min | Unlimited | 60 min | Unlimited | Unlimited |
| WebSocket/SSE | No | Yes | Yes | Yes | Yes |
| Autonomous agents | No | Yes | Yes (no throttle) | Yes | Yes |
| Pricing model | Per request | Per vCPU-hour | Per request | Per vCPU-sec | Per node |
| Complexity | Low | Medium | Low | Low | High |
Cross-Cloud Agent Example #
Here is a complete example of a Neam agent that can run on any cloud provider. The
agent code is identical; only neam.toml changes:
agent CustomerBot {
provider: "openai"
model: "gpt-4o-mini"
temperature: 0.3
system: "You are a customer service agent. Be helpful and professional."
learning: {
strategy: "experience_replay"
review_interval: 20
}
memory: "customer_memory"
}
agent EscalationBot {
provider: "openai"
model: "gpt-4o"
temperature: 0.1
system: "You handle escalated customer issues with extra care."
memory: "escalation_memory"
}
fun handle_request(query) {
let response = CustomerBot.ask(query);
// Check if escalation is needed
if (response.contains("ESCALATE")) {
return EscalationBot.ask("Escalated issue: " + query);
}
return response;
}
{
let query = input();
let result = handle_request(query);
emit result;
}
AWS Configuration #
# neam.toml for AWS
[project]
name = "customer-bot"
version = "1.0.0"
[project.entry_points]
main = "src/main.neam"
[state]
backend = "dynamodb"
connection-string = "dynamodb://us-east-1/customer-bot-state"
[llm]
default-provider = "openai"
default-model = "gpt-4o-mini"
[llm.rate-limits.openai]
requests-per-minute = 120
[secrets]
provider = "aws-secrets-manager"
region = "us-east-1"
prefix = "production/customer-bot/"
[deploy]
target = "ecs-fargate"
GCP Configuration #
# neam.toml for GCP
[project]
name = "customer-bot"
version = "1.0.0"
[project.entry_points]
main = "src/main.neam"
[state]
backend = "postgres"
connection-string = "secret://DATABASE_URL"
[llm]
default-provider = "openai"
default-model = "gpt-4o-mini"
[llm.rate-limits.openai]
requests-per-minute = 120
[secrets]
provider = "gcp-secret-manager"
project = "customer-bot-prod"
[deploy]
target = "cloud-run"
Azure Configuration #
# neam.toml for Azure
[project]
name = "customer-bot"
version = "1.0.0"
[project.entry_points]
main = "src/main.neam"
[state]
backend = "cosmosdb"
connection-string = "cosmosdb://customerbot.documents.azure.com/customer-bot-db"
[llm]
default-provider = "azure_openai"
default-model = "gpt-4o-mini"
[llm.rate-limits.azure_openai]
requests-per-minute = 120
[secrets]
provider = "azure-key-vault"
vault-url = "https://customerbot-vault.vault.azure.net"
[deploy]
target = "azure-container-apps"
Multi-Cloud Considerations #
Vendor lock-in: Using cloud-specific state backends (DynamoDB, CosmosDB) and LLM providers (Bedrock, Vertex AI, Azure OpenAI) creates coupling. If portability is a priority, use PostgreSQL for state and the standard OpenAI/Anthropic APIs for LLM calls. These work on any cloud.
Latency: Place your state backend in the same region as your compute. LLM API calls to OpenAI or Anthropic go to the public internet regardless of cloud provider. Cloud-specific LLM providers (Bedrock, Vertex AI, Azure OpenAI) keep traffic on the cloud provider's network, which can reduce latency by 20-50ms.
Cost: Serverless targets (Lambda, Cloud Run, Container Apps) are cheapest for sporadic workloads. For sustained load, container-based targets (ECS Fargate, Kubernetes) are more cost-effective. DynamoDB PAY_PER_REQUEST is cheapest for variable workloads; provisioned capacity is cheaper for predictable workloads.
Compliance: Some industries require data residency in specific regions. Cloud- specific LLM providers (Bedrock, Azure OpenAI) offer more data residency controls than the public OpenAI or Anthropic APIs. Azure OpenAI, in particular, offers data processing within Azure data centers with no data leaving your subscription.
21.6 Cloud-Specific Build Pipeline #
Here is a CI/CD pipeline that builds and deploys to multiple clouds:
# .github/workflows/deploy.yml
name: Multi-Cloud Deploy
on:
push:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build Docker image
run: |
docker build \
--build-arg CMAKE_FLAGS="-DNEAM_BACKEND_POSTGRES=ON -DNEAM_BACKEND_AWS=ON -DNEAM_BACKEND_GCP=ON -DNEAM_BACKEND_AZURE=ON" \
-t neam-agent:${{ github.sha }} .
- name: Push to ECR (AWS)
run: |
aws ecr get-login-password | docker login --username AWS --password-stdin $ECR_REGISTRY
docker tag neam-agent:${{ github.sha }} $ECR_REGISTRY/neam-agent:${{ github.sha }}
docker push $ECR_REGISTRY/neam-agent:${{ github.sha }}
- name: Push to GCR (GCP)
run: |
gcloud auth configure-docker
docker tag neam-agent:${{ github.sha }} gcr.io/$GCP_PROJECT/neam-agent:${{ github.sha }}
docker push gcr.io/$GCP_PROJECT/neam-agent:${{ github.sha }}
- name: Push to ACR (Azure)
run: |
az acr login --name neamacr
docker tag neam-agent:${{ github.sha }} neamacr.azurecr.io/neam-agent:${{ github.sha }}
docker push neamacr.azurecr.io/neam-agent:${{ github.sha }}
deploy-aws:
needs: build
runs-on: ubuntu-latest
steps:
- name: Deploy to ECS
run: |
aws ecs update-service \
--cluster neam-cluster \
--service neam-agent \
--force-new-deployment
deploy-gcp:
needs: build
runs-on: ubuntu-latest
steps:
- name: Deploy to Cloud Run
run: |
gcloud run deploy neam-agent \
--image gcr.io/$GCP_PROJECT/neam-agent:${{ github.sha }} \
--region us-central1
deploy-azure:
needs: build
runs-on: ubuntu-latest
steps:
- name: Deploy to Container Apps
run: |
az containerapp update \
--name neam-agent \
--resource-group neam-rg \
--image neamacr.azurecr.io/neam-agent:${{ github.sha }}
21.7 Terraform Integration #
For teams that manage infrastructure with Terraform, the neamc deploy command can
generate Terraform configurations instead of raw cloud manifests. This integrates
Neam deployments into existing Infrastructure-as-Code workflows.
Generating Terraform Configurations #
# Generate Terraform for AWS
neamc deploy --target terraform --cloud aws --output ./terraform/
# Generate Terraform for GCP
neamc deploy --target terraform --cloud gcp --output ./terraform/
# Generate Terraform for Azure
neamc deploy --target terraform --cloud azure --output ./terraform/
# Preview without writing files
neamc deploy --target terraform --cloud aws --dry-run
The generated Terraform files define all the cloud resources your agent needs:
Example: AWS Terraform Output #
# Generated: compute.tf (AWS ECS Fargate)
resource "aws_ecs_cluster" "neam" {
name = var.project_name
setting {
name = "containerInsights"
value = "enabled"
}
}
resource "aws_ecs_service" "neam_agent" {
name = "${var.project_name}-agent"
cluster = aws_ecs_cluster.neam.id
task_definition = aws_ecs_task_definition.neam_agent.arn
desired_count = var.min_replicas
launch_type = "FARGATE"
network_configuration {
subnets = var.private_subnet_ids
security_groups = [aws_security_group.neam_agent.id]
assign_public_ip = false
}
load_balancer {
target_group_arn = aws_lb_target_group.neam_agent.arn
container_name = "neam-agent"
container_port = 8080
}
}
resource "aws_appautoscaling_target" "neam_agent" {
max_capacity = var.max_replicas
min_capacity = var.min_replicas
resource_id = "service/${aws_ecs_cluster.neam.name}/${aws_ecs_service.neam_agent.name}"
scalable_dimension = "ecs:service:DesiredCount"
service_namespace = "ecs"
}
Applying Terraform #
cd terraform/
# Initialize and validate
terraform init
terraform validate
# Plan the changes
terraform plan -out=tfplan
# Apply
terraform apply tfplan
# Outputs show the deployment endpoint
terraform output api_url
Mixing neamc deploy with Existing Terraform #
If you already have Terraform-managed infrastructure, you can import the generated resources into your existing state or reference them as modules:
# In your existing Terraform project
module "neam_agent" {
source = "./modules/neam-agent"
project_name = "customer-bot"
image_tag = var.neam_image_tag
min_replicas = 2
max_replicas = 10
vpc_id = module.networking.vpc_id
private_subnets = module.networking.private_subnet_ids
}
21.8 Cloud-Specific Observability #
Each cloud provider offers its own monitoring service. Neam integrates with all three through the OpenTelemetry Collector, which translates OTLP data into cloud-native formats.
AWS: CloudWatch Integration #
On AWS, the OTel Collector forwards traces to AWS X-Ray and metrics to CloudWatch Metrics:
# OTel Collector config for AWS
exporters:
awsxray:
region: us-east-1
awsemf:
region: us-east-1
namespace: Neam
log_group_name: /neam/metrics
service:
pipelines:
traces:
receivers: [otlp]
exporters: [awsxray]
metrics:
receivers: [otlp]
exporters: [awsemf]
For ECS Fargate deployments, the OTel Collector runs as a sidecar container in the same task definition. For Lambda deployments, the Neam runtime exports spans directly via the OTLP/HTTP endpoint — no collector is needed.
GCP: Cloud Trace and Cloud Monitoring #
On GCP, the OTel Collector exports to Cloud Trace and Cloud Monitoring:
# OTel Collector config for GCP
exporters:
googlecloud:
project: my-gcp-project
trace:
endpoint: cloudtrace.googleapis.com:443
metric:
endpoint: monitoring.googleapis.com:443
service:
pipelines:
traces:
receivers: [otlp]
exporters: [googlecloud]
metrics:
receivers: [otlp]
exporters: [googlecloud]
Cloud Run deployments can use the built-in Cloud Trace integration. Set the
GOOGLE_CLOUD_PROJECT environment variable, and the OTel Collector sidecar
authenticates automatically via the service account.
Azure: Application Insights #
On Azure, the OTel Collector exports to Azure Monitor (Application Insights):
# OTel Collector config for Azure
exporters:
azuremonitor:
connection_string: ${APPLICATIONINSIGHTS_CONNECTION_STRING}
service:
pipelines:
traces:
receivers: [otlp]
exporters: [azuremonitor]
metrics:
receivers: [otlp]
exporters: [azuremonitor]
Application Insights provides built-in dashboards, smart detection alerts, and an application map that visualizes service dependencies.
Exporter Comparison #
| Feature | Jaeger + Prometheus | AWS X-Ray + CloudWatch | GCP Cloud Trace | Azure App Insights |
|---|---|---|---|---|
| Self-hosted | Yes | No | No | No |
| Cost | Infrastructure only | Per trace/metric | Per trace/metric | Per GB ingested |
| Trace retention | Configurable | 30 days | 30 days | 90 days |
| Built-in alerting | Via Alertmanager | CloudWatch Alarms | Cloud Monitoring | Smart Detection |
| Multi-cloud | Yes | AWS only | GCP only | Azure only |
For multi-cloud deployments, use the self-hosted stack (Jaeger + Prometheus + Grafana) as the single pane of glass, with cloud-native exporters as secondary destinations for teams that prefer their cloud provider's tooling.
Summary #
In this chapter, you learned:
- AWS deployment: Lambda (serverless), ECS Fargate (containers), Bedrock (managed LLM), DynamoDB (state), Secrets Manager
- GCP deployment: Cloud Run (serverless containers), Vertex AI (managed LLM), Firestore (state), GCP Secret Manager
- Azure deployment: Container Apps (serverless containers), AKS (Kubernetes), Azure OpenAI (managed LLM), CosmosDB (state), Key Vault (secrets)
- Compile flags for cloud-specific backends
- How to write cloud-portable Neam agents with environment-specific
neam.toml - Decision criteria for choosing between deployment targets
- Multi-cloud CI/CD pipelines
- Terraform integration for Infrastructure-as-Code workflows
- Cloud-specific observability: AWS X-Ray, GCP Cloud Trace, Azure Application Insights
In the next chapter, we will cover observability and monitoring -- how to see what your deployed agents are doing in production.
Exercises #
Exercise 21.1: Lambda Configuration #
Write a complete neam.toml for deploying a Neam agent to AWS Lambda with:
- DynamoDB state backend in
us-west-2 - Bedrock as the LLM provider using Claude 3.5 Sonnet
- AWS Secrets Manager for credentials
- 1024 MB memory, 30-second timeout
Then write the Neam agent code that uses this configuration.
Exercise 21.2: Cloud Run vs. Container Apps #
Compare deploying the same Neam agent to Google Cloud Run and Azure Container Apps. For each platform:
- Write the
neam.tomlconfiguration - List the CLI commands to deploy
- Describe how autoscaling is configured
- Explain how secrets are injected
Exercise 21.3: Multi-Cloud State Migration #
You have a Neam agent running on AWS with DynamoDB state that needs to be migrated to Azure with CosmosDB. Describe:
- The data model differences between DynamoDB and CosmosDB
- A migration strategy that minimizes downtime
- How you would verify data integrity after migration
- What changes are needed in
neam.toml
Exercise 21.4: Cloud-Native LLM Provider Selection #
For each scenario, recommend an LLM provider and justify your choice:
- A healthcare agent that must keep all data within a specific AWS region
- A startup that wants the cheapest option for GPT-4-class models
- An enterprise that needs Azure AD integration for compliance
- A research team that wants access to open-source models (Llama 3) without managing GPU infrastructure
- A multi-cloud deployment where the LLM provider must work on all three clouds
Exercise 21.5: Cost Optimization #
A Neam agent receives 10,000 requests per day with the following pattern: - 8 AM - 6 PM: 80% of traffic (800 requests/hour) - 6 PM - 8 AM: 20% of traffic (140 requests/hour)
Calculate the monthly compute cost for each deployment target: 1. AWS Lambda (128 MB, 5s avg duration) 2. AWS ECS Fargate (0.5 vCPU, 1 GB, always-on) 3. Google Cloud Run (1 vCPU, 1 GB, min 0 instances) 4. Azure Container Apps (0.5 vCPU, 1 GB, min 0 instances)
Use current pricing from each provider's documentation. Which target is cheapest?
Exercise 21.6: Disaster Recovery #
Design a disaster recovery strategy for a Neam agent deployed on AWS that needs:
- RPO (Recovery Point Objective) of 1 hour
- RTO (Recovery Time Objective) of 15 minutes
- Automatic failover to a secondary region
Describe the architecture, including state replication, DNS failover, and the
neam.toml changes needed for each region.