Chapter 20: Docker and Kubernetes #
"Containers do not solve the deployment problem. Containers solve the 'it works on my machine' problem. Kubernetes solves the deployment problem." -- Kelsey Hightower, paraphrased
What You Will Learn #
In this chapter, you will learn how to package Neam agents as Docker containers and
deploy them to Kubernetes. You will build a multi-stage Docker image, stand up a
complete development stack with Docker Compose, write Kubernetes manifests with
production-grade health probes, configure GitOps with Kustomize overlays, set up
continuous deployment with ArgoCD and FluxCD, and implement autoscaling strategies.
By the end, you will be able to take a Neam agent from source code to a production
Kubernetes cluster with a single git push.
20.1 The Docker Multi-Stage Build #
The Neam Dockerfile uses a multi-stage build to keep the final image small. The builder stage includes all compilation tools (CMake, g++, development headers); the runtime stage includes only the compiled binaries and minimal runtime libraries.
The Complete Dockerfile #
# =============================================================================
# Neam v0.6.0 Multi-stage Docker Build
# =============================================================================
# -----------------------------------------------------------------------------
# Stage 1: Builder
# -----------------------------------------------------------------------------
FROM ubuntu:24.04 AS builder
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cmake \
g++ \
libcurl4-openssl-dev \
libssl-dev \
libpq-dev \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /build
# Copy dependency files first for better layer caching
COPY CMakeLists.txt ./
COPY deps/ deps/
COPY NeamC/ NeamC/
COPY tests/ tests/
# Configure and build with PostgreSQL backend enabled
RUN cmake -B build \
-DCMAKE_BUILD_TYPE=Release \
-DNEAM_BACKEND_POSTGRES=ON \
&& cmake --build build -j$(nproc)
# Run tests during build to catch issues early
RUN ctest --test-dir build --output-on-failure || true
# -----------------------------------------------------------------------------
# Stage 2: Runtime
# -----------------------------------------------------------------------------
FROM ubuntu:24.04
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
curl \
libcurl4 \
libssl3 \
libpq5 \
&& rm -rf /var/lib/apt/lists/* \
&& groupadd -r neam && useradd -r -g neam -d /app -s /sbin/nologin neam
WORKDIR /app
# Copy binaries from builder
COPY --from=builder /build/build/neamc /usr/local/bin/neamc
COPY --from=builder /build/build/neam /usr/local/bin/neam
COPY --from=builder /build/build/neam-cli /usr/local/bin/neam-cli
COPY --from=builder /build/build/neam-api /usr/local/bin/neam-api
COPY --from=builder /build/build/neam-forge /usr/local/bin/neam-forge
COPY --from=builder /build/build/neam-lsp /usr/local/bin/neam-lsp
# Copy stdlib
COPY --from=builder /build/NeamC/stdlib /app/stdlib
# Create data directories (include session and workspace storage)
RUN mkdir -p /app/data /app/sessions /app/workspace /tmp/neam && \
chown -R neam:neam /app /tmp/neam
ENV NEAM_ENV=production
ENV NEAM_LOG_LEVEL=info
ENV NEAM_PORT=8080
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
CMD curl -f http://localhost:8080/health || exit 1
USER neam
ENTRYPOINT ["neam-api"]
CMD ["--port", "8080"]
Understanding the Layers #
Let us examine the key design decisions in this Dockerfile.
Layer caching: The COPY instructions are ordered from least-frequently-changed to
most-frequently-changed. CMakeLists.txt and deps/ change rarely, so Docker can
cache those layers. NeamC/ contains the source code and changes with every commit, so
it is copied last. This optimization can reduce rebuild times from 10 minutes to under
1 minute for source-only changes.
Non-root user: The runtime stage creates a neam user and group, and the final
USER neam instruction ensures the container runs as non-root. This is a security
requirement in most production Kubernetes clusters.
Health check: The HEALTHCHECK instruction tells Docker (and Docker Compose) how
to verify the container is healthy. The neam-api server exposes a /health endpoint
that returns HTTP 200 when the service is operational.
Minimal runtime: The runtime image includes only the libraries needed to run the compiled binaries. The build tools, headers, and intermediate objects are left behind in the builder stage. This reduces the image from ~2 GB to ~150 MB.
Building the Image #
# Build the image
docker build -t neam-agent:latest .
# Build with a specific tag
docker build -t neam-agent:v0.6.0 .
# Build with multi-cloud backends
docker build \
--build-arg CMAKE_FLAGS="-DNEAM_BACKEND_POSTGRES=ON -DNEAM_BACKEND_AWS=ON" \
-t neam-agent:aws .
# Run the image
docker run -p 8080:8080 \
-e OPENAI_API_KEY="sk-..." \
neam-agent:latest
# Run with a custom agent file
docker run -p 8080:8080 \
-e OPENAI_API_KEY="sk-..." \
-v $(pwd)/src:/app/src \
neam-agent:latest \
neam-api --port 8080 --agent-file /app/src/main.neamb
20.2 Docker Compose: The Development Stack #
For local development, Docker Compose orchestrates Neam alongside its supporting services: PostgreSQL for state, Redis for caching, an OpenTelemetry Collector for trace ingestion, Jaeger for trace visualization, and Prometheus for metrics.
The Complete docker-compose.yml #
services:
neam-agent:
build:
context: .
dockerfile: Dockerfile
ports:
- "8080:8080"
environment:
- NEAM_ENV=development
- NEAM_LOG_LEVEL=debug
- NEAM_PORT=8080
- DATABASE_URL=postgres://neam:neam_dev@postgres:5432/neam_dev
- REDIS_URL=redis://redis:6379
- OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- OPENAI_API_KEY=${OPENAI_API_KEY}
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
volumes:
- neam-data:/app/data
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 15s
timeout: 5s
retries: 5
start_period: 10s
postgres:
image: postgres:16
environment:
POSTGRES_DB: neam_dev
POSTGRES_USER: neam
POSTGRES_PASSWORD: neam_dev
ports:
- "5432:5432"
volumes:
- postgres-data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U neam -d neam_dev"]
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped
redis:
image: redis:7
ports:
- "6379:6379"
volumes:
- redis-data:/data
command: redis-server --appendonly yes
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
ports:
- "4317:4317" # gRPC OTLP
- "4318:4318" # HTTP OTLP
- "8889:8889" # Prometheus exporter
volumes:
- ./docker/otel-config.yaml:/etc/otelcol-contrib/config.yaml:ro
depends_on:
- jaeger
restart: unless-stopped
jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "16686:16686" # Jaeger UI
- "14250:14250" # gRPC model.proto
- "14268:14268" # HTTP collector
environment:
- COLLECTOR_OTLP_ENABLED=true
restart: unless-stopped
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./docker/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus-data:/prometheus
command:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus"
- "--storage.tsdb.retention.time=7d"
depends_on:
- otel-collector
restart: unless-stopped
volumes:
neam-data:
postgres-data:
redis-data:
prometheus-data:
Starting the Development Stack #
# Start all services
docker compose up -d
# Watch logs
docker compose logs -f neam-agent
# Check service health
docker compose ps
# Access the services:
# - Neam API: http://localhost:8080
# - Jaeger UI: http://localhost:16686
# - Prometheus: http://localhost:9090
# - Postgres: localhost:5432
# - Redis: localhost:6379
# Stop all services
docker compose down
# Stop and remove volumes (clean slate)
docker compose down -v
The OpenTelemetry Collector Configuration #
The OTel Collector receives traces and metrics from the Neam agent and forwards them to Jaeger (traces) and Prometheus (metrics):
# docker/otel-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
send_batch_size: 1024
timeout: 5s
memory_limiter:
check_interval: 5s
limit_mib: 512
spike_limit_mib: 128
resource:
attributes:
- key: service.name
value: neam-agent
action: upsert
exporters:
jaeger:
endpoint: jaeger:14250
tls:
insecure: true
prometheus:
endpoint: 0.0.0.0:8889
namespace: neam
resource_to_telemetry_conversion:
enabled: true
debug:
verbosity: basic
extensions:
health_check:
endpoint: 0.0.0.0:13133
service:
extensions: [health_check]
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch, resource]
exporters: [jaeger, debug]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch, resource]
exporters: [prometheus, debug]
20.3 Kubernetes Deployment #
Moving from Docker Compose to Kubernetes is the leap from single-machine development
to production-grade orchestration. Neam provides both pre-built manifests in the
gitops/ directory and the neamc deploy command for generating custom manifests.
Kubernetes Architecture Overview #
Base Manifests #
The Neam project ships with production-ready Kubernetes manifests in gitops/base/.
Let us examine each one.
Deployment #
The Deployment defines how Neam pods are created and managed:
# gitops/base/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: neam-agent
labels:
app.kubernetes.io/name: neam-agent
app.kubernetes.io/version: "0.6.0"
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: neam-agent
template:
metadata:
labels:
app.kubernetes.io/name: neam-agent
spec:
serviceAccountName: neam-agent
terminationGracePeriodSeconds: 30
containers:
- name: neam-agent
image: neam-agent:latest
ports:
- name: http
containerPort: 8080
protocol: TCP
envFrom:
- configMapRef:
name: neam-config
- secretRef:
name: neam-secrets
optional: true
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 15
periodSeconds: 20
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: "1"
memory: 1Gi
securityContext:
runAsNonRoot: true
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
volumeMounts:
- name: tmp
mountPath: /tmp
- name: data
mountPath: /app/data
- name: session-storage
mountPath: /app/sessions
- name: forge-workspace
mountPath: /app/workspace
volumes:
- name: tmp
emptyDir: {}
- name: data
emptyDir: {}
- name: session-storage
emptyDir: {}
- name: forge-workspace
emptyDir: {}
For production deployments with claw agents that need session persistence
across pod restarts, replace the emptyDir: {} volumes with PersistentVolumeClaim
references:
- name: session-storage
persistentVolumeClaim:
claimName: neam-sessions-pvc
- name: forge-workspace
persistentVolumeClaim:
claimName: neam-workspace-pvc
Key points to understand:
- Health probes: The liveness probe hits
/healthto detect deadlocked processes. The readiness probe hits/readyto check that the state backend and LLM providers are accessible. We cover health check semantics in detail in Chapter 22. - Security context: The container runs as non-root with a read-only filesystem,
no privilege escalation, and all Linux capabilities dropped. The
tmp,data,session-storage, andforge-workspacevolumes provide writable directories for temporary files, local state, claw agent sessions, and forge agent workspaces. - Resource requests and limits: The requests guarantee the pod gets at least 500m CPU and 512Mi memory. The limits cap it at 1 CPU and 1Gi to prevent noisy-neighbor problems.
Service #
The Service provides a stable network endpoint for the pods:
# gitops/base/service.yaml
apiVersion: v1
kind: Service
metadata:
name: neam-agent
labels:
app.kubernetes.io/name: neam-agent
spec:
type: ClusterIP
ports:
- name: http
port: 80
targetPort: 8080
protocol: TCP
selector:
app.kubernetes.io/name: neam-agent
ConfigMap #
# gitops/base/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: neam-config
labels:
app.kubernetes.io/name: neam-agent
data:
NEAM_ENV: "production"
NEAM_LOG_LEVEL: "info"
NEAM_PORT: "8080"
NEAM_TELEMETRY_ENABLED: "true"
NEAM_OTEL_ENDPOINT: "http://otel-collector.observability.svc.cluster.local:4318"
HorizontalPodAutoscaler #
# gitops/base/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: neam-agent
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: neam-agent
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Pods
value: 2
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Pods
value: 1
periodSeconds: 120
The HPA behavior section is critical for AI agent workloads. Scale-up is aggressive (add up to 2 pods per minute) because LLM calls have latency and you want capacity before requests start queuing. Scale-down is conservative (remove at most 1 pod every 2 minutes) to avoid flapping during intermittent traffic patterns.
PodDisruptionBudget #
# gitops/base/pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: neam-agent
spec:
minAvailable: 1
selector:
matchLabels:
app.kubernetes.io/name: neam-agent
The PDB ensures at least 1 pod remains available during voluntary disruptions like node upgrades or cluster scaling.
NetworkPolicy #
# gitops/base/networkpolicy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: neam-agent
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: neam-agent
policyTypes:
- Ingress
- Egress
ingress:
- ports:
- port: 8080
protocol: TCP
egress:
- {}
Ingress is restricted to port 8080 only. Egress is open because Neam agents need to reach external LLM APIs, state backends, and the OTel collector. In a more restrictive environment, you would enumerate the allowed egress targets.
Kustomization #
Kustomize ties all base resources together:
# gitops/base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
commonLabels:
app.kubernetes.io/name: neam-agent
app.kubernetes.io/part-of: neam
app.kubernetes.io/managed-by: kustomize
resources:
- deployment.yaml
- service.yaml
- configmap.yaml
- hpa.yaml
- pdb.yaml
- networkpolicy.yaml
- serviceaccount.yaml
20.4 GitOps with Kustomize Overlays #
Kustomize overlays allow you to customize the base manifests for different environments without duplicating YAML. The Neam project ships with three overlays: dev, staging, and production.
Development Overlay #
# gitops/overlays/dev/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: neam-dev
resources:
- ../../base
patches:
- target:
kind: Deployment
name: neam-agent
patch: |
- op: replace
path: /spec/replicas
value: 1
- op: replace
path: /spec/template/spec/containers/0/resources/requests/cpu
value: 250m
- op: replace
path: /spec/template/spec/containers/0/resources/requests/memory
value: 256Mi
- op: replace
path: /spec/template/spec/containers/0/resources/limits/cpu
value: 500m
- op: replace
path: /spec/template/spec/containers/0/resources/limits/memory
value: 512Mi
- target:
kind: ConfigMap
name: neam-config
patch: |
- op: replace
path: /data/NEAM_ENV
value: development
- op: replace
path: /data/NEAM_LOG_LEVEL
value: debug
commonLabels:
app.kubernetes.io/instance: dev
Staging Overlay #
# gitops/overlays/staging/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: neam-staging
resources:
- ../../base
patches:
- target:
kind: Deployment
name: neam-agent
patch: |
- op: replace
path: /spec/replicas
value: 2
- op: replace
path: /spec/template/spec/containers/0/resources/requests/cpu
value: 500m
- op: replace
path: /spec/template/spec/containers/0/resources/limits/cpu
value: "1"
- target:
kind: ConfigMap
name: neam-config
patch: |
- op: replace
path: /data/NEAM_ENV
value: staging
- op: replace
path: /data/NEAM_LOG_LEVEL
value: info
- target:
kind: HorizontalPodAutoscaler
name: neam-agent
patch: |
- op: replace
path: /spec/minReplicas
value: 2
- op: replace
path: /spec/maxReplicas
value: 10
commonLabels:
app.kubernetes.io/instance: staging
Production Overlay #
# gitops/overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: neam-production
resources:
- ../../base
patches:
- target:
kind: Deployment
name: neam-agent
patch: |
- op: replace
path: /spec/replicas
value: 3
- op: replace
path: /spec/template/spec/containers/0/resources/requests/cpu
value: "1"
- op: replace
path: /spec/template/spec/containers/0/resources/requests/memory
value: 512Mi
- op: replace
path: /spec/template/spec/containers/0/resources/limits/cpu
value: "2"
- op: replace
path: /spec/template/spec/containers/0/resources/limits/memory
value: 1Gi
- target:
kind: ConfigMap
name: neam-config
patch: |
- op: replace
path: /data/NEAM_ENV
value: production
- op: replace
path: /data/NEAM_LOG_LEVEL
value: warn
- target:
kind: HorizontalPodAutoscaler
name: neam-agent
patch: |
- op: replace
path: /spec/minReplicas
value: 3
- op: replace
path: /spec/maxReplicas
value: 20
- target:
kind: PodDisruptionBudget
name: neam-agent
patch: |
- op: replace
path: /spec/minAvailable
value: 2
commonLabels:
app.kubernetes.io/instance: production
Building and Applying Overlays #
# Preview the dev overlay
kubectl kustomize gitops/overlays/dev
# Apply the dev overlay
kubectl apply -k gitops/overlays/dev
# Preview the production overlay
kubectl kustomize gitops/overlays/production
# Apply the production overlay
kubectl apply -k gitops/overlays/production
# Diff against the running cluster
kubectl diff -k gitops/overlays/production
20.5 ArgoCD Setup #
ArgoCD provides automated GitOps deployment. When you push changes to the Git repository, ArgoCD detects the change, renders the Kustomize overlay, and applies the resulting manifests to the cluster.
ArgoCD Application #
# gitops/argocd/application.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: neam-agent
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: neam
source:
repoURL: https://github.com/neam-lang/Neam.git
targetRevision: main
path: gitops/overlays/production
destination:
server: https://kubernetes.default.svc
namespace: neam-production
syncPolicy:
automated:
prune: true
selfHeal: true
allowEmpty: false
syncOptions:
- CreateNamespace=true
- PrunePropagationPolicy=foreground
- PruneLast=true
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
revisionHistoryLimit: 10
Key ArgoCD concepts:
- automated.prune: Deletes resources that are no longer in Git
- automated.selfHeal: Reverts manual changes made directly to the cluster
- CreateNamespace: Creates the target namespace if it does not exist
- retry: Retries failed syncs with exponential backoff
Setting Up ArgoCD #
# Install ArgoCD
kubectl create namespace argocd
kubectl apply -n argocd -f \
https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
# Wait for ArgoCD to be ready
kubectl -n argocd rollout status deploy/argocd-server
# Get the initial admin password
kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath="{.data.password}" | base64 -d
# Apply the ArgoCD project and application
kubectl apply -f gitops/argocd/appproject.yaml
kubectl apply -f gitops/argocd/application.yaml
# Port-forward to access the ArgoCD UI
kubectl port-forward svc/argocd-server -n argocd 8443:443
# Open https://localhost:8443 in your browser
20.6 FluxCD Setup #
FluxCD is an alternative GitOps controller. Where ArgoCD uses a pull-based UI model, FluxCD is more Kubernetes-native and uses CRDs (Custom Resource Definitions) for everything.
FluxCD Kustomization #
# gitops/fluxcd/kustomization.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: neam-agent
namespace: flux-system
spec:
interval: 5m
retryInterval: 2m
timeout: 3m
sourceRef:
kind: GitRepository
name: neam
path: ./gitops/overlays/production
prune: true
wait: true
healthChecks:
- apiVersion: apps/v1
kind: Deployment
name: neam-agent
namespace: neam-production
patches:
- patch: |
apiVersion: apps/v1
kind: Deployment
metadata:
name: neam-agent
annotations:
fluxcd.io/automated: "true"
target:
kind: Deployment
name: neam-agent
FluxCD GitRepository #
# gitops/fluxcd/gitrepository.yaml
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: neam
namespace: flux-system
spec:
interval: 1m
url: https://github.com/neam-lang/Neam.git
ref:
branch: main
secretRef:
name: neam-git-credentials
Setting Up FluxCD #
# Install FluxCD
flux install
# Create the GitRepository source
kubectl apply -f gitops/fluxcd/gitrepository.yaml
# Create the Kustomization
kubectl apply -f gitops/fluxcd/kustomization.yaml
# Check the reconciliation status
flux get kustomizations
flux get sources git
20.7 Health Probes #
Kubernetes uses three types of probes to manage container lifecycle. The Neam API server implements all three:
| Probe | Endpoint | Purpose | Failure Action |
|---|---|---|---|
| Liveness | GET /health |
Is the process alive? | Kill and restart the pod |
| Readiness | GET /ready |
Can the pod serve traffic? | Remove from Service endpoints |
| Startup | GET /startup |
Has initialization completed? | Wait, then fall through to liveness |
Liveness Probe: /health #
Returns HTTP 200 if the Neam process is alive and responsive. This is a simple heartbeat -- it does not check external dependencies.
$ curl -i http://localhost:8080/health
HTTP/1.1 200 OK
Content-Type: application/json
{"status": "ok", "version": "0.6.0"}
Readiness Probe: /ready #
Returns HTTP 200 only when the agent can serve requests. This means:
- The state backend is connected and responsive
- At least one LLM provider circuit is closed (not all providers failed)
- If telemetry is enabled, the OTLP export queue is not full
$ curl -i http://localhost:8080/ready
HTTP/1.1 200 OK
Content-Type: application/json
{
"status": "ready",
"state_backend": "postgres",
"state_backend_status": "connected",
"llm_providers": {
"openai": "healthy",
"anthropic": "healthy"
},
"telemetry": "ok"
}
If the state backend is down:
$ curl -i http://localhost:8080/ready
HTTP/1.1 503 Service Unavailable
Content-Type: application/json
{
"status": "not_ready",
"state_backend": "postgres",
"state_backend_status": "connection_refused",
"llm_providers": {
"openai": "healthy"
}
}
Startup Probe: /startup #
Returns HTTP 200 once the VM has completed initialization: loaded the bytecode, connected to the state backend, registered agents, and started the autonomous executor (if configured).
# Adding a startup probe to the deployment
startupProbe:
httpGet:
path: /startup
port: http
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 30
With the startup probe configured, Kubernetes waits up to 150 seconds (30 x 5s) for initialization before starting liveness and readiness checks. This is important for agents that need to ingest knowledge bases at startup.
20.8 Scaling Strategies #
HPA: CPU-Based Scaling #
The default HPA scales based on CPU utilization. This works well for most Neam workloads because LLM calls (while they wait on external APIs) still consume CPU for request marshaling, response parsing, and RAG retrieval.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: neam-agent
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: neam-agent
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Pods
value: 2
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Pods
value: 1
periodSeconds: 120
KEDA: Event-Driven Scaling #
For event-driven workloads (message queues, scheduled autonomous agents), KEDA provides more sophisticated scaling triggers:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: neam-agent
namespace: neam-production
spec:
scaleTargetRef:
name: neam-agent
minReplicaCount: 1
maxReplicaCount: 20
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.observability:9090
metricName: neam_llm_requests_queued
threshold: "10"
query: |
sum(neam_llm_requests_queued{service="neam-agent"})
- type: cron
metadata:
timezone: America/New_York
start: 0 8 * * 1-5
end: 0 18 * * 1-5
desiredReplicas: "5"
This KEDA configuration scales based on two triggers:
- Prometheus metric: When the LLM request queue exceeds 10, add more pods
- Cron schedule: During business hours (8 AM - 6 PM ET, Monday-Friday), maintain at least 5 replicas
20.9 Security Best Practices #
Non-Root Execution #
The Dockerfile creates a dedicated neam user, and the deployment enforces it:
securityContext:
runAsNonRoot: true
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
Read-Only Root Filesystem #
The container's root filesystem is mounted read-only. The only writable paths are explicitly mounted volumes:
volumeMounts:
- name: tmp
mountPath: /tmp # For temporary files during LLM calls
- name: data
mountPath: /app/data # For local SQLite state (if used)
volumes:
- name: tmp
emptyDir: {}
- name: data
emptyDir: {}
Network Policies #
The NetworkPolicy restricts which pods can communicate with the Neam agent:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: neam-agent
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: neam-agent
policyTypes:
- Ingress
- Egress
ingress:
- ports:
- port: 8080
protocol: TCP
egress:
- {} # Open egress for LLM APIs
For stricter environments, restrict egress to known endpoints:
egress:
# Allow DNS
- to:
- namespaceSelector: {}
ports:
- port: 53
protocol: UDP
# Allow LLM API calls
- to:
- ipBlock:
cidr: 0.0.0.0/0
ports:
- port: 443
protocol: TCP
# Allow state backend
- to:
- namespaceSelector:
matchLabels:
name: databases
ports:
- port: 5432
protocol: TCP
# Allow OTel Collector
- to:
- namespaceSelector:
matchLabels:
name: observability
ports:
- port: 4318
protocol: TCP
Secrets Management #
Never put secrets in ConfigMaps. Use Kubernetes Secrets with encryption at rest:
# Create a secret
kubectl create secret generic neam-secrets \
-n neam-production \
--from-literal=OPENAI_API_KEY="sk-..." \
--from-literal=ANTHROPIC_API_KEY="sk-ant-..."
# Or use a cloud secrets provider (AWS, GCP, Azure) with External Secrets Operator
20.10 The neamc deploy Command #
The neamc deploy command generates deployment artifacts from your neam.toml:
# Generate Kubernetes manifests
neamc deploy --target kubernetes --output ./deploy/
# Dry run (preview without writing files)
neamc deploy --target kubernetes --dry-run
# Generate Helm chart
neamc deploy --target helm --output ./chart/
# Generate for Docker only
neamc deploy --target docker --output ./deploy/
The command reads your neam.toml configuration and generates manifests tailored to
your settings. For example, if you have telemetry enabled, the Kubernetes manifest will
include an OTel Collector sidecar. If you have an HPA section, the manifest will
include a HorizontalPodAutoscaler.
$ neamc deploy --target kubernetes --dry-run
=== Generating Kubernetes manifests ===
Reading neam.toml...
Project: customer-service-agent v2.1.0
State backend: postgres
Telemetry: enabled (endpoint: http://otel-collector:4318)
Deploy target: kubernetes (namespace: neam-production)
Generated files:
deployment.yaml (3 replicas, resource limits, health probes)
service.yaml (ClusterIP, port 80 -> 8080)
configmap.yaml (NEAM_ENV, NEAM_LOG_LEVEL, ...)
hpa.yaml (min: 3, max: 20, CPU target: 70%)
pdb.yaml (minAvailable: 2)
networkpolicy.yaml (ingress: 8080, egress: all)
serviceaccount.yaml (neam-agent service account)
kustomization.yaml (ties all resources together)
Dry run complete. No files written.
20.11 CI/CD Pipelines with GitHub Actions #
Continuous integration and continuous deployment (CI/CD) pipelines automate the build-test-deploy cycle. The Neam project ships three GitHub Actions workflows: one for continuous integration on every push, one for deploying to Kubernetes, and one for security scanning.
Continuous Integration Workflow #
The CI workflow runs on every push and pull request. It builds the Neam toolchain across multiple platforms, runs the test suite, and publishes container images for tagged releases.
name: CI
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
build:
strategy:
matrix:
os: [ubuntu-latest, macos-latest]
build_type: [Release, Debug]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v4
- name: Install dependencies (Ubuntu)
if: runner.os == 'Linux'
run: |
sudo apt-get update
sudo apt-get install -y build-essential cmake libcurl4-openssl-dev \
libssl-dev libpq-dev
- name: Install dependencies (macOS)
if: runner.os == 'macOS'
run: brew install cmake openssl postgresql libpq
- name: Configure
run: cmake -B build -DCMAKE_BUILD_TYPE=${{ matrix.build_type }}
- name: Build
run: cmake --build build -j$(nproc 2>/dev/null || sysctl -n hw.ncpu)
- name: Test
run: ctest --test-dir build --output-on-failure
container:
needs: build
if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags/v')
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v4
- uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- uses: docker/build-push-action@v5
with:
push: true
tags: |
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.ref_name }}
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
The matrix strategy builds across Ubuntu and macOS with both Release and Debug
configurations, catching platform-specific issues early. The container job only
runs on version tags (v1.0.0, v0.6.0), pushing multi-tagged images to the
GitHub Container Registry.
Kubernetes Deployment Workflow #
The deployment workflow uses environment-based dispatch. Pushing to main deploys
to staging automatically; production deployments require manual approval.
name: Deploy to Kubernetes
on:
workflow_dispatch:
inputs:
environment:
description: "Target environment"
required: true
type: choice
options:
- staging
- production
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
environment: ${{ github.event.inputs.environment || 'staging' }}
steps:
- uses: actions/checkout@v4
- name: Set up kubectl
uses: azure/setup-kubectl@v3
- name: Configure kubeconfig
run: |
mkdir -p ~/.kube
echo "${{ secrets.KUBECONFIG }}" | base64 -d > ~/.kube/config
- name: Set image tag
run: |
cd gitops/overlays/${{ github.event.inputs.environment || 'staging' }}
kustomize edit set image \
neam-agent=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
- name: Apply manifests
run: |
kubectl apply -k \
gitops/overlays/${{ github.event.inputs.environment || 'staging' }}
- name: Wait for rollout
run: |
kubectl -n neam-${{ github.event.inputs.environment || 'staging' }} \
rollout status deployment/neam-agent --timeout=300s
- name: Verify health
run: |
kubectl -n neam-${{ github.event.inputs.environment || 'staging' }} \
exec deploy/neam-agent -- curl -sf http://localhost:8080/ready
The workflow uses Kubernetes environment protection rules. For production, you can
configure required reviewers and deployment branch restrictions in your repository
settings.
20.12 Security Scanning #
Automated security scanning catches vulnerabilities before they reach production. A dedicated workflow runs three types of scans: container image scanning, static code analysis, and dependency review.
Security Scan Workflow #
name: Security Scan
on:
push:
branches: [main]
pull_request:
branches: [main]
schedule:
- cron: "0 6 * * 1" # Weekly Monday at 6 AM UTC
permissions:
contents: read
security-events: write
jobs:
trivy-container:
name: Trivy Container Scan
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build image
run: docker build -t neam-agent:scan .
- name: Run Trivy container scan
uses: aquasecurity/trivy-action@master
with:
image-ref: neam-agent:scan
format: sarif
output: trivy-container.sarif
severity: CRITICAL,HIGH
- name: Upload results to GitHub Security
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: trivy-container.sarif
trivy-filesystem:
name: Trivy Filesystem Scan
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run Trivy filesystem scan
uses: aquasecurity/trivy-action@master
with:
scan-type: fs
scan-ref: .
format: sarif
output: trivy-fs.sarif
severity: CRITICAL,HIGH,MEDIUM
- name: Upload results to GitHub Security
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: trivy-fs.sarif
codeql:
name: CodeQL Analysis
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Initialize CodeQL
uses: github/codeql-action/init@v3
with:
languages: cpp
- name: Build
run: |
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)
- name: Perform CodeQL analysis
uses: github/codeql-action/analyze@v3
dependency-review:
name: Dependency Review
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Dependency review
uses: actions/dependency-review-action@v4
with:
fail-on-severity: high
Each scan serves a different purpose:
| Scan | Tool | What It Finds |
|---|---|---|
| Container scan | Trivy | OS package vulnerabilities in the Docker image |
| Filesystem scan | Trivy | Vulnerabilities in source dependencies and config files |
| Static analysis | CodeQL | Code-level security issues (buffer overflows, injection, etc.) |
| Dependency review | GitHub | New vulnerable dependencies introduced in pull requests |
The weekly scheduled scan (cron: "0 6 * * 1") catches newly disclosed CVEs in
dependencies that were safe when first added. Results are uploaded as SARIF
(Static Analysis Results Interchange Format) to GitHub's Security tab, where they
appear alongside code scanning alerts.
Interpreting Scan Results #
When Trivy reports a vulnerability, evaluate whether it is reachable from your code.
A HIGH severity CVE in libssl matters if your agent makes HTTPS calls (it does).
A HIGH severity CVE in libx11 does not matter because the container has no
graphical interface.
To suppress known false positives, create a .trivyignore file:
# Not exploitable: Neam does not use the affected API
CVE-2024-XXXXX
20.13 Helm Chart Generation #
Helm is the package manager for Kubernetes. While Kustomize works well for teams
that manage their own clusters, Helm charts are the standard distribution format
for applications that others will deploy. The neamc deploy command can generate
a complete Helm chart from your neam.toml.
Generating a Helm Chart #
# Generate a Helm chart
neamc deploy --target helm --output ./chart/
# Preview the generated chart structure
tree chart/
The generated chart follows the standard Helm layout:
Chart.yaml #
The Chart.yaml metadata is populated from your neam.toml project section:
apiVersion: v2
name: neam-agent
description: A Neam AI agent deployed to Kubernetes
type: application
version: 0.1.0
appVersion: "0.6.0"
keywords:
- neam
- ai-agent
- llm
values.yaml #
The values.yaml file exposes all configurable parameters with sensible defaults:
replicaCount: 3
image:
repository: ghcr.io/neam-lang/neam-agent
tag: "0.6.0"
pullPolicy: IfNotPresent
env:
NEAM_ENV: production
NEAM_LOG_LEVEL: info
NEAM_PORT: "8080"
NEAM_TELEMETRY_ENABLED: "true"
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: "1"
memory: 1Gi
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 20
targetCPUUtilization: 70
probes:
liveness:
path: /health
initialDelaySeconds: 15
periodSeconds: 20
readiness:
path: /ready
initialDelaySeconds: 5
periodSeconds: 10
startup:
path: /startup
periodSeconds: 5
failureThreshold: 30
securityContext:
runAsNonRoot: true
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
dropCapabilities:
- ALL
Installing the Chart #
# Install from the generated chart directory
helm install my-agent ./chart/ \
--namespace neam-production \
--create-namespace \
--set env.OPENAI_API_KEY="sk-..."
# Install with custom values
helm install my-agent ./chart/ \
--namespace neam-production \
-f custom-values.yaml
# Upgrade an existing release
helm upgrade my-agent ./chart/ \
--namespace neam-production \
--set image.tag="v1.1.0"
# Preview what will be applied (dry run)
helm install my-agent ./chart/ --dry-run --debug
# Package the chart for distribution
helm package ./chart/
# Creates neam-agent-0.1.0.tgz
Helm provides rollback capabilities that Kustomize does not:
# List release history
helm history my-agent -n neam-production
# Roll back to the previous release
helm rollback my-agent -n neam-production
# Roll back to a specific revision
helm rollback my-agent 3 -n neam-production
20.14 Complete Deployment Walkthrough #
Let us walk through a complete deployment from source to production:
# 1. Write your agent
cat > src/main.neam << 'EOF'
agent Assistant {
provider: "openai"
model: "gpt-4o-mini"
system: "You are a helpful assistant."
memory: "assistant_memory"
}
{
let query = input();
emit Assistant.ask(query);
}
EOF
# 2. Compile the agent
neamc src/main.neam -o src/main.neamb
# 3. Build the Docker image
docker build -t my-registry.com/neam-agent:v1.0.0 .
# 4. Push to container registry
docker push my-registry.com/neam-agent:v1.0.0
# 5. Update the image tag in the overlay
cd gitops/overlays/production
# Add an image transformer to kustomization.yaml:
cat >> kustomization.yaml << 'EOF'
images:
- name: neam-agent
newName: my-registry.com/neam-agent
newTag: v1.0.0
EOF
# 6. Commit and push
git add -A && git commit -m "Deploy v1.0.0"
git push origin main
# 7. ArgoCD/FluxCD detects the change and deploys automatically
# 8. Verify the deployment
kubectl -n neam-production get pods
kubectl -n neam-production rollout status deployment/neam-agent
# 9. Test the endpoint
kubectl -n neam-production port-forward svc/neam-agent 8080:80
curl -X POST http://localhost:8080/api/v1/agent/ask \
-H "Content-Type: application/json" \
-d '{"agent_id": "Assistant", "query": "Hello!"}'
Summary #
In this chapter, you learned:
- How to build Neam agents as multi-stage Docker images
- How to run a complete development stack with Docker Compose
- The structure of Kubernetes base manifests: Deployment, Service, ConfigMap, HPA, PDB, NetworkPolicy
- GitOps with Kustomize overlays for dev, staging, and production environments
- ArgoCD and FluxCD setup for automated continuous deployment
- Health probe semantics: liveness, readiness, and startup
- Autoscaling with HPA (CPU-based) and KEDA (event-driven)
- Security best practices: non-root, read-only filesystem, dropped capabilities, network policies
- The
neamc deploycommand for generating platform-specific manifests - CI/CD pipelines with GitHub Actions: matrix builds, environment-based deployment, rollout verification
- Security scanning with Trivy (container and filesystem), CodeQL (static analysis), and dependency review
- Helm chart generation from
neam.toml, chart structure, installation, upgrades, and rollbacks
In the next chapter, we will extend these deployment patterns across multiple cloud providers: AWS, GCP, and Azure.
Exercises #
Exercise 20.1: Docker Build Optimization #
Modify the Neam Dockerfile to support building with additional backends. Add a
CMAKE_FLAGS build argument so that the image can be built with:
docker build --build-arg CMAKE_FLAGS="-DNEAM_BACKEND_POSTGRES=ON -DNEAM_BACKEND_AWS=ON" .
Hint: Use ARG CMAKE_FLAGS="-DNEAM_BACKEND_POSTGRES=ON" in the builder stage.
Exercise 20.2: Docker Compose Extension #
Add a Grafana service to the Docker Compose stack that:
- Runs on port 3000
- Auto-provisions Prometheus as a data source
- Auto-provisions Jaeger as a data source
- Depends on both Prometheus and Jaeger
Exercise 20.3: Kustomize Overlay #
Create a new Kustomize overlay for a canary environment that:
- Runs in the
neam-canarynamespace - Uses 1 replica (no HPA)
- Sets
NEAM_LOG_LEVELtodebug - Sets
NEAM_TELEMETRY_SAMPLING_RATEto1.0(trace everything) - Uses 256Mi memory request and 512Mi limit
Exercise 20.4: Health Probe Tuning #
A Neam agent takes 45 seconds to ingest its knowledge base at startup. The current
deployment has no startup probe, and the liveness probe has initialDelaySeconds: 15.
- What happens when this agent is deployed? (Hint: the liveness probe fires before the knowledge base is loaded.)
- Write a startup probe configuration that handles this case.
- What values would you use for
periodSecondsandfailureThreshold?
Exercise 20.5: KEDA Scaling #
Design a KEDA ScaledObject for a Neam agent that processes messages from an AWS SQS queue. The scaling should:
- Scale to 0 when the queue is empty
- Add 1 pod per 5 messages in the queue
- Maximum 15 replicas
- Cool down for 5 minutes before scaling to 0
Exercise 20.6: Network Policy #
Write a NetworkPolicy that restricts a Neam agent to:
- Accept ingress only from pods in the
api-gatewaynamespace on port 8080 - Allow egress only to:
- DNS (port 53 UDP)
- HTTPS (port 443 TCP) for LLM API calls
- PostgreSQL (port 5432 TCP) to pods labeled
app=postgresin thedatabasesnamespace - OTel Collector (port 4318 TCP) in the
observabilitynamespace
Exercise 20.7: End-to-End Deployment #
Starting from scratch, deploy a Neam agent to a local Kubernetes cluster (minikube or kind). The steps are:
- Create a kind cluster
- Build the Docker image and load it into the cluster
- Create the namespace and secrets
- Apply the base manifests
- Verify the pods are running and healthy
- Send a test query to the agent
Document each command and its expected output.