Programming Neam
📖 13 min read

Case Study: Multi-Channel Customer Support #


Building a customer support system that operates across multiple communication channels is a natural fit for claw agents. Unlike the stateless multi-agent orchestration covered in the previous case study, this system centers on a single persistent agent that maintains conversation context, searches a knowledge base semantically, routes messages through CLI and HTTP channels, prioritizes requests with lanes, and monitors its own health through trait implementations. Every feature of the claw agent construct -- sessions, compaction, channels, lanes, semantic memory, and traits -- comes together in one production-grade system.

By the end of this case study, you will have a working multi-channel support bot that greets users, answers FAQ questions from a semantic knowledge base, creates support tickets, persists conversation sessions across restarts, handles concurrent requests through prioritized lanes, runs scheduled health checks, detects anomalies in its own response latency, and deploys to Kubernetes with proper volume mounts and health probes.


1 Requirements #

Before writing any code, let us define what the system must do. A multi-channel customer support agent has six core requirements:

  1. Multi-channel access. The agent must accept messages through a CLI channel for local development and an HTTP channel for production deployment. Each channel has its own messaging policy: the CLI channel uses direct message (DM) policy for one-on-one interactions, while the HTTP channel supports group policy for team-based support queues.

  2. Session persistence. Conversations must survive across multiple interactions. A customer who asks about an order and returns an hour later should not have to repeat themselves. Sessions are stored in JSONL format on disk and support idle reset, daily reset, and automatic compaction.

  3. Semantic memory. The agent must retrieve relevant information from a knowledge base of markdown files using hybrid search (vector similarity and keyword matching). This allows the agent to answer common questions without an LLM round-trip when the answer is already in the documentation.

  4. Concurrent request handling. Direct messages from individual users require fast response times, while group channel messages can tolerate slightly longer queues. Lanes partition these workloads with independent concurrency limits and priority levels.

  5. Scheduled health checks. The agent must periodically check for pending support tickets that have not been resolved and notify operators when backlogs grow. This is a background heartbeat that runs independently of user interactions.

  6. Anomaly detection. The agent must monitor its own response latency and tool call frequency. If either metric exceeds expected baselines, the system must alert operators or, in extreme cases, disable the agent to prevent cascading failures.

💠 Why This Matters

Most AI agent tutorials demonstrate single-channel, stateless interactions. Production support systems are neither. Real customers reach out through web chat, CLI tools, and API integrations simultaneously. They expect the agent to remember context, respond within acceptable latency, and not crash under load. This case study bridges the gap between tutorial and production.


2 Architecture Design #

The architecture places a single claw agent at the center, with two channels feeding messages into it. A workspace directory holds the knowledge base and session archives. Lanes partition concurrent requests by priority. Two trait implementations -- Schedulable and Monitorable -- add cross-cutting operational behavior.

SupportBot (claw agent)
Skills: [GreetSkill, FAQSkill, TicketSkill]
Tools: [lookup_order, check_status, create_ticket]
Guards: [PIIFilter, QualityCheck]
CLI Channel
policy: DM
HTTP Channel
port: 8090
policy: group
impl Schedulable (heartbeat 30s, check tickets)
impl Monitorable (latency/tool-call baselines)

Key architectural decisions: a single agent with multiple channels ensures consistent behavior and shared session storage. The workspace directory is the single source of truth for knowledge base files, sessions, indexes, and tickets. Lane-based prioritization routes DM messages to a high-priority lane and group messages to a standard lane.


3 Agent Declaration #

Let us build the complete SupportBot declaration. This is the central piece of the system. We start with the channel and guard declarations, then define the claw agent itself.

neam
// ================================================================
// support_bot.neam -- Multi-Channel Customer Support Agent
// ================================================================

// ── Channel Declarations ────────────────────────────────────────────

channel SupportCLI {
  type: "cli"
  prompt: "customer> "
  greeting: "Welcome to Acme Support. How can I help you today?"
}

channel SupportHTTP {
  type: "http"
  port: 8090
  path: "/v1/support"
  cors: true
  policy: "group"
}

// ── Guard Declarations ──────────────────────────────────────────────

guard pii_filter {
  description: "Redacts PII from inbound messages before they reach the LLM"
  on_observation(input) {
    // Redact credit card numbers (16 digits with optional separators)
    let sanitized = input.replace_regex(
      "\\b\\d{4}[- ]?\\d{4}[- ]?\\d{4}[- ]?\\d{4}\\b",
      "[REDACTED-CC]"
    );
    // Redact SSN patterns
    sanitized = sanitized.replace_regex(
      "\\b\\d{3}-\\d{2}-\\d{4}\\b",
      "[REDACTED-SSN]"
    );
    // Redact email addresses
    sanitized = sanitized.replace_regex(
      "\\b[\\w.+-]+@[\\w-]+\\.[\\w.]+\\b",
      "[REDACTED-EMAIL]"
    );
    return sanitized;
  }
}

guard quality_check {
  description: "Validates that agent responses meet quality standards"
  on_response(output) {
    // Block responses that expose internal system details
    if (output.contains("INTERNAL ERROR") | output.contains("stack trace")) {
      return "block";
    }
    // Block responses shorter than 10 characters (likely errors)
    if (len(output) < 10) {
      return "block";
    }
    return output;
  }
}

guardchain PIIFilter = [pii_filter];
guardchain QualityCheck = [quality_check];

// ── Budget Declaration ──────────────────────────────────────────────

budget SupportBudget {
  api_calls: 1000
  tokens: 2000000
  cost_usd: 50.0
  reset: "daily"
}

// ── Claw Agent Declaration ──────────────────────────────────────────

claw agent SupportBot {
  provider: "anthropic"
  model: "claude-sonnet-4"
  temperature: 0.4
  system: "You are a customer support agent for Acme Corp. Your role is to:
           1. Greet customers warmly and identify their needs.
           2. Answer frequently asked questions using your knowledge base.
           3. Look up order details and check ticket statuses.
           4. Create support tickets for issues that require follow-up.
           5. Be empathetic, concise, and professional at all times.

           If a customer provides personal information like credit card numbers,
           acknowledge that it has been redacted for their protection and ask
           them to use the secure portal instead.

           Always reference specific order numbers and ticket IDs when available."

  channels: [SupportCLI, SupportHTTP]
  skills: [GreetSkill, FAQSkill, TicketSkill]
  guards: [PIIFilter, QualityCheck]
  budget: SupportBudget

  workspace: "./.neam/support_workspace"

  session: {
    idle_reset_minutes: 1440
    daily_reset_hour: 4
    max_history_turns: 200
    compaction: "auto"
  }

  lanes: {
    high_priority: { concurrency: 5, priority: "high" }
    standard: { concurrency: 20, priority: "normal" }
  }

  semantic_memory: {
    backend: "sqlite"
    embedding_model: "text-embedding-3-small"
    search: "hybrid"
    top_k: 5
  }
}

Key decisions: idle_reset_minutes: 1440 gives customers a full business day to return without losing context. max_history_turns: 200 provides room for lengthy debugging conversations. compaction: "auto" prevents context overflow. search: "hybrid" combines vector similarity with BM25 keyword matching for the best results with support knowledge bases that mix natural language and product codes.


4 Skills and Tools #

Skills are the actions the agent can perform. Each has a description the LLM uses for routing, typed parameters, and an implementation. Let us define three skills and tools.

GreetSkill #

The greeting skill personalizes the welcome message based on whether the customer is new or returning:

neam
skill GreetSkill {
  description: "Greet the customer. Use for initial hellos and welcome messages."
  params: { customer_name: string }

  impl(customer_name) {
    // Check if we have prior interaction history
    let history = session_history("greeting", 1);

    if (history != nil & len(history) > 0) {
      return f"Welcome back, {customer_name}! I have your previous conversation "
           + "on file. How can I help you today?";
    }

    return f"Hello, {customer_name}! Welcome to Acme Support. I am here to help "
         + "with any questions about your orders, account, or our products. "
         + "What can I assist you with?";
  }
}

FAQSkill #

The FAQ skill searches the semantic memory for answers to common questions before generating a response:

neam
skill FAQSkill {
  description: "Answer frequently asked questions using the knowledge base. Use this
                when the customer asks about policies, product information, shipping,
                returns, or general company information."
  params: { question: string }

  impl(question) {
    // Search the knowledge base for relevant content
    let results = memory_search(question, 3);

    if (results == nil | len(results) == 0) {
      return "I could not find a specific answer in our knowledge base. "
           + "Let me connect you with a specialist who can help.";
    }

    // Build context from the top results
    let context = "";
    for (result in results) {
      context = context + result["content"] + "\n\n";
    }

    return f"Based on our knowledge base:\n\n{context}\n"
         + "Is there anything else you would like to know?";
  }
}

TicketSkill #

The ticket skill creates support tickets for issues that require follow-up:

neam
skill TicketSkill {
  description: "Create a support ticket for issues needing investigation or escalation."
  params: { subject: string, description: string, priority: string }

  impl(subject, description, priority) {
    let ticket_id = "TKT-" + str(random_int(10000, 99999));
    let ticket = {
      "id": ticket_id, "subject": subject, "description": description,
      "priority": priority, "status": "open", "created_at": now_iso()
    };
    workspace_append("tickets/pending.jsonl", to_json(ticket) + "\n");

    return f"Created ticket {ticket_id} (priority: {priority}). "
         + "Our team will follow up within 24 hours.";
  }
}

Tools #

Tools provide direct access to backend systems for single lookups or mutations:

neam
skill lookup_order {
  description: "Look up an order by its order ID to retrieve status and details"
  params: { order_id: string }

  impl(order_id) {
    // In production, this would query a database or REST API
    let data = workspace_read("orders/" + order_id + ".json");
    if (data == nil) {
      return f"No order found with ID: {order_id}";
    }
    return data;
  }
}

skill check_status {
  description: "Check the current status of a support ticket by ticket ID"
  params: { ticket_id: string }

  impl(ticket_id) {
    let tickets = workspace_read("tickets/pending.jsonl");
    if (tickets == nil) {
      return f"No ticket found with ID: {ticket_id}";
    }

    let lines = tickets.split("\n");
    for (line in lines) {
      if (line.contains(ticket_id)) {
        return line;
      }
    }
    return f"Ticket {ticket_id} not found in the active queue.";
  }
}

Skill Routing #

The LLM decides which skill to invoke based on the customer's message. The routing happens automatically in the tool-calling loop (Step 4 of the .ask() pipeline). For example, when a customer asks "What is your return policy?", the LLM reads all skill descriptions, selects FAQSkill(question: "return policy"), which calls memory_search("return policy", 3), and the LLM formulates a final response from the knowledge base results.

💡 Tip

Write skill descriptions as if you are explaining the skill to a new employee. The LLM uses these descriptions to decide when to invoke each skill. Vague descriptions lead to incorrect routing. Be specific about what the skill does and when it should be used.


5 Channel Configuration #

Channels define how messages enter and leave the agent. SupportBot uses two channels.

CLI Channel: Direct Messages #

The CLI channel handles local development and one-on-one interactions with DM policy:

neam
channel SupportCLI {
  type: "cli"
  prompt: "customer> "
  greeting: "Welcome to Acme Support. How can I help you today?"
}

The CLI channel reads from stdin and writes to stdout in a loop, displaying the greeting on startup and the custom prompt before each input. Each CLI session is isolated to the terminal user with session key SupportBot/cli_<user_id>.

HTTP Channel: Group Support #

The HTTP channel serves production traffic through a REST endpoint with group policy, meaning all users in the support queue share a single conversation history:

neam
channel SupportHTTP {
  type: "http"
  port: 8090
  path: "/v1/support"
  cors: true
  policy: "group"
}

The HTTP channel accepts POST requests with JSON bodies and returns JSON responses:

json
// Request
{ "content": "Where is my order #12345?", "user_id": "customer_alice",
  "metadata": { "lane": "high_priority", "source": "web_widget" } }

// Response
{ "content": "Your order #12345 was shipped on Feb 17...",
  "session_key": "SupportBot/http_group",
  "metadata": { "tokens_used": 284, "latency_ms": 1230 } }

DM vs Group Behavior #

The two policies produce fundamentally different session behaviors:

Aspect DM Policy (CLI) Group Policy (HTTP)
Session key SupportBot/cli_<user_id> SupportBot/http_group
History isolation Per user Shared across all users
Compaction scope Individual history Combined group history
Use case Personal support Team support queues
Lane routing High-priority (DM) Standard (group)

Channel Security #

Both channels pass through the full .ask() pipeline, including guard chains:

PIIFilter Guard
SupportBot .ask()
QualityCheck
Common Mistake

Do not assume that the CLI channel is "safe" and skip guardrails during development. Developers testing the system locally often paste real customer data into the CLI for debugging. The PII filter catches this regardless of channel. Always apply guardrails uniformly across all channels.


6 Session Management #

Sessions give the claw agent its conversation memory. Every message exchange is recorded, persisted to disk, and loaded when the customer returns.

JSONL Session Storage #

Each session is stored as a JSONL file where every line is a JSON object:

jsonl
{"role":"user","content":"Hi, I need help with my order.","ts":"2026-02-18T09:00:01Z"}
{"role":"assistant","content":"Hello! I would be happy to help. Could you provide your order number?","ts":"2026-02-18T09:00:03Z"}
{"role":"user","content":"It is ORD-78432.","ts":"2026-02-18T09:00:15Z"}
{"role":"assistant","content":"Let me look that up for you.","ts":"2026-02-18T09:00:16Z"}
{"role":"tool","content":"{\"order_id\":\"ORD-78432\",\"status\":\"shipped\",\"eta\":\"2026-02-20\"}","ts":"2026-02-18T09:00:17Z"}
{"role":"assistant","content":"Your order ORD-78432 has been shipped and is expected to arrive by February 20.","ts":"2026-02-18T09:00:19Z"}

The JSONL format is append-only, which makes writes fast and crash-safe. Each line includes a timestamp (ts) for audit trails and debugging.

Session Lifecycle #

┌──────────────────────────────────────────────────────────┐
│              SupportBot Session Lifecycle                  │
├──────────────────────────────────────────────────────────┤
│                                                            │
│  1. CREATE (first message for this session key)           │
│     → Generate key: SupportBot/cli_customer_alice         │
│     → Create current.jsonl, initialize empty history      │
│              │                                             │
│              ▼                                             │
│  2. ACTIVE (each .ask() call)                             │
│     → Append user message + assistant response            │
│     → Flush to disk after each turn                       │
│     → Trigger compaction if needed                        │
│              │                                             │
│       ┌──────┼──────┐                                      │
│       ▼      ▼      ▼                                      │
│   Idle 24h  4 AM  .reset()                                │
│       │      │      │                                      │
│       └──────┼──────┘                                      │
│              ▼                                             │
│  3. RESET                                                  │
│     → Archive current.jsonl to archive/ directory         │
│     → Clear history, create new session ID                │
│                                                            │
└──────────────────────────────────────────────────────────┘

Idle Reset and Daily Reset #

The idle_reset_minutes: 1440 setting means the session resets after 24 hours of inactivity. The runtime checks idle time on every incoming message; if the last activity was more than 1440 minutes ago, the session is archived and a new one is created.

At 4:00 AM each day (daily_reset_hour: 4), all active sessions are archived and reset regardless of activity. This ensures stale sessions do not accumulate across business days.

Accessing Session History #

Within skills, use session_history(key, limit) to query the current conversation history. For example, session_history("all", 50) returns the last 50 messages. The GreetSkill uses session_history("greeting", 1) to detect returning customers.

💡 Tip

Use session_history() sparingly in skills. The full history is already included in the LLM context during the .ask() pipeline. Query it programmatically only when you need to count messages, search for patterns, or extract structured data.


7 Auto-Compaction #

Long-running conversations can exceed the model's context window. Auto-compaction prevents this by summarizing old turns while preserving recent context.

Compaction triggers when assembled context exceeds 80% of the model's maximum token limit. When it fires, the runtime performs three steps:

  1. Pre-compaction flush. All pending session writes are flushed to disk. This ensures no messages are lost if the compaction process fails partway through. The flush also writes the current session state to the workspace for recovery:
neam
// The runtime performs this automatically before compaction
workspace_write(
  "compaction/pre-flush-" + str(now_epoch()) + ".jsonl",
  session_to_jsonl(current_session)
);
  1. Select and summarize. The oldest two-thirds of conversation turns are selected for summarization. The most recent turns (default: 20) are always preserved verbatim. The selected turns are sent to the LLM with a compaction prompt.

  2. Replace. The selected turns are replaced with a single system message containing the summary. The recent turns remain unchanged.

Compaction Flow Diagram #

┌──────────────────────────────────────────────────────────┐
│                  Auto-Compaction Flow                      │
├──────────────────────────────────────────────────────────┤
│                                                            │
│  Context token count: system + history + memory           │
│       │                                                    │
│       ├── Under 80% → Proceed normally                    │
│       │                                                    │
│       └── Over 80% → Trigger compaction                   │
│            │                                               │
│            ▼                                               │
│  Step 1: Pre-Compaction Flush                             │
│    → Flush pending writes, snapshot to workspace          │
│            │                                               │
│            ▼                                               │
│  Step 2: Select and Summarize                             │
│    History: [msg1, msg2, ..., msg180]                     │
│    Oldest 2/3 (msg1-msg120) → Send to LLM for summary   │
│    Recent 1/3 (msg121-msg180) → Keep verbatim             │
│            │                                               │
│            ▼                                               │
│  Step 3: Replace                                          │
│    New history: [summary_msg, msg121, ..., msg180]        │
│    summary_msg = { role: "system",                        │
│      content: "[Session Summary] Customer asked about     │
│      order ORD-78432, shipping status..." }               │
│                                                            │
└──────────────────────────────────────────────────────────┘
💠 Why This Matters

Without compaction, a support bot serving hundreds of customers daily would hit context limits within the first few long conversations. Compaction is not a performance optimization -- it is a correctness requirement. An agent that silently drops old context produces inconsistent responses. An agent that summarizes old context preserves the essential facts while staying within token budgets.


8 Semantic Memory #

Semantic memory provides a searchable knowledge base. Instead of hardcoding FAQ answers in the system prompt, we store them in markdown files and let the agent search by meaning.

Workspace Knowledge Files #

The knowledge base consists of markdown files in kb/:

📁.neam/support_workspace/
📁kb/
📄faq.md
📄shipping-policy.md
📄return-policy.md
📄products.md
📄troubleshooting.md
📁sessions/
📁index/
📄vectors.idx
📄bm25.idx
📁tickets/

Here is an example knowledge base file (kb/faq.md):

markdown
# Frequently Asked Questions

## What is your return policy?
Items can be returned within 30 days of delivery for a full refund. Items
returned between 31 and 60 days receive a 50% refund or full store credit.
Damaged items qualify for a full refund regardless of timeframe.

## How long does shipping take?
Standard: 5-7 business days. Express: 2-3 business days.
Overnight: next business day (orders placed before 2 PM ET).

Configuring Semantic Memory #

The semantic_memory block in the agent declaration configures how knowledge files are indexed and searched:

neam
semantic_memory: {
  backend: "sqlite"
  embedding_model: "text-embedding-3-small"
  search: "hybrid"
  top_k: 5
}
Field Description
backend Storage backend for the index. "sqlite" is the default for local development.
embedding_model The model used to generate vector embeddings from text chunks.
search Search strategy: "vector", "keyword", or "hybrid".
top_k Number of results to return from each search query.

Hybrid Search Scoring #

When search is "hybrid", the runtime combines two strategies:

text
final_score = (0.70 * vector_score) + (0.30 * bm25_score)

The vector score (70%) uses cosine similarity for semantic meaning -- "What is your refund policy?" matches "Items can be returned within 30 days." The BM25 score (30%) uses term frequency for exact matches -- searching "ORD-78432" finds that exact string. This 70/30 split works well for support knowledge bases mixing natural language and product codes.

Within skills, use memory_search(query, top_k) to search the indexed knowledge base. The FAQSkill in Section 4 demonstrates this pattern: it calls memory_search(question, 3) and builds a response from the returned results. Each result contains content, score, and source fields that identify the matching chunk, its relevance score, and the originating file.

Workspace Read, Write, and Append #

Beyond semantic search, the agent can directly read and write workspace files. Use workspace_read("kb/return-policy.md") for direct file access, workspace_write("kb/new-product.md", content) to create or overwrite files, and workspace_append("logs/interactions.jsonl", entry) to add log entries. These functions are demonstrated throughout the skills and tools in Sections 4 and 10.


9 Lane Configuration #

Lanes partition incoming requests into separate queues, preventing low-priority group messages from starving high-priority direct messages.

Lane Declarations #

neam
lanes: {
  high_priority: { concurrency: 5, priority: "high" }
  standard: { concurrency: 20, priority: "normal" }
}

Lane Routing #

Requests are assigned to lanes based on the lane field in the message metadata. For the HTTP channel, the client sets this field explicitly. For the CLI channel, the runtime defaults to the high_priority lane for DM messages:

json
{
  "content": "I need help urgently!",
  "user_id": "customer_alice",
  "metadata": {
    "lane": "high_priority"
  }
}

If no lane field is present, the request goes to the first lane with priority: "normal" (the standard lane in our configuration).

Lane Routing Diagram #

High Priority Lane
Workers: [W][W][W][W][W]
Max: 5, Priority: high
Standard Lane
Workers: [W][W]...[W]
Max: 20, Priority: normal

Queue Semantics #

Each lane maintains an independent queue with key behaviors:

💡 Tip

Set the concurrency for high-priority lanes conservatively. A high concurrency limit on the high-priority lane defeats the purpose of prioritization -- you want fewer, faster workers for urgent requests and more workers for bulk processing. A good ratio is 1:4 (high-priority to standard).


10 Trait Implementations #

SupportBot implements two traits: Schedulable for periodic health checks and Monitorable for anomaly detection.

Implementing Schedulable #

SupportBot uses its heartbeat to check for pending tickets that have not been addressed:

neam
impl Schedulable for SupportBot {
  fn on_heartbeat(self) {
    // Read the pending tickets file
    let pending = workspace_read("tickets/pending.jsonl");

    if (pending == nil) {
      return HeartbeatAction.Silent;
    }

    // Count open tickets and find the oldest
    let lines = pending.split("\n");
    let open_count = 0;
    let oldest_ticket = nil;
    for (line in lines) {
      if (len(line) == 0) { continue; }
      let ticket = parse_json(line);
      if (ticket["status"] == "open") {
        open_count = open_count + 1;
        if (oldest_ticket == nil) { oldest_ticket = ticket; }
      }
    }

    if (open_count > 5) {
      return HeartbeatAction.Notify(
        f"ALERT: {open_count} open tickets. Oldest: {oldest_ticket[\"id\"]}"
      );
    }
    if (oldest_ticket != nil & hours_since(parse_iso(oldest_ticket["created_at"])) > 4) {
      return HeartbeatAction.Notify(
        f"WARNING: Ticket {oldest_ticket[\"id\"]} open for 4+ hours."
      );
    }
    return HeartbeatAction.Silent;
  }

  fn heartbeat_interval(self) {
    return 30000;   // Check every 30 seconds
  }

  fn on_cron(self, cron_id) {
    if (cron_id == "daily_report") {
      let pending = workspace_read("tickets/pending.jsonl");
      let report = self.ask(
        f"Generate a daily support report from ticket data:\n{pending}"
      );
      workspace_write("reports/daily-" + today_iso() + ".md", report);
      return HeartbeatAction.Notify("Daily report generated.");
    }
    return nil;
  }
}

The on_heartbeat method returns a HeartbeatAction sealed type: HeartbeatAction.Notify(message) delivers a proactive notification, and HeartbeatAction.Silent takes no action.

Implementing Monitorable #

SupportBot detects when response latency or tool call frequency exceeds baselines:

neam
impl Monitorable for SupportBot {
  fn baseline(self) {
    return {
      "avg_tool_calls": 2,
      "avg_response_time_ms": 1200,
      "max_tool_calls": 8,
      "max_response_time_ms": 5000,
      "max_cost_per_turn": 0.10,
      "max_consecutive_errors": 3
    };
  }

  fn on_anomaly(self, event) {
    let metric = event["metric"];
    let actual = event["actual"];
    let threshold = event["threshold"];

    // Log every anomaly to workspace for audit
    workspace_append("logs/anomalies.jsonl",
      to_json({ "metric": metric, "actual": actual,
                "threshold": threshold, "timestamp": now_iso() }) + "\n"
    );

    // Critical: too many tool calls or consecutive errors → disable
    if ((metric == "max_tool_calls" & actual > 15) |
        (metric == "max_consecutive_errors")) {
      emit f"CRITICAL: {metric} = {actual}. Disabling agent.";
      return AnomalyAction.Disable;
    }

    // All other anomalies: alert but continue serving
    return AnomalyAction.Alert(
      f"Anomaly: {metric} = {actual} (threshold: {threshold})"
    );
  }
}

The on_anomaly method returns an AnomalyAction sealed type with three variants: AnomalyAction.Alert(message) logs and continues, AnomalyAction.Silence suppresses the notification (useful during known high-traffic periods), and AnomalyAction.Disable activates the kill switch immediately.

💠 Why This Matters

A support bot without monitoring is a liability. An infinite tool-calling loop can burn through your entire API budget in minutes. A latency spike that goes undetected means customers are waiting 30 seconds for responses and leaving in frustration. Monitorable provides the guardrails that keep the agent within operational bounds, and Schedulable provides the proactive checks that catch issues before customers do.


11 Guard Rails #

SupportBot uses two guard chains declared in Section 3. The PII filter redacts credit card numbers, SSNs, and email addresses from inbound messages using regex patterns before they reach the LLM. The quality check blocks outbound responses that expose internal system details or are too short to be useful. When a guard returns "block", the customer receives a generic fallback message.

Common Mistake

Do not rely solely on the system prompt to prevent PII leakage. System prompts are suggestions, not enforcement. An LLM might still echo back a credit card number if it appears in the conversation history. Guard chains provide programmatic enforcement that works regardless of the LLM's behavior.


12 Testing #

Test at three levels: individual skills, end-to-end conversation, and session persistence.

Unit Testing Skills #

Test each skill in isolation to verify its behavior:

neam
test "GreetSkill returns welcome for new customers" {
  let result = GreetSkill.impl("Alice");
  assert(result.contains("Hello, Alice"));
  assert(result.contains("Acme Support"));
}

test "FAQSkill returns knowledge base results" {
  workspace_write("kb/faq.md", "## Return Policy\nItems can be returned within 30 days.");
  let result = FAQSkill.impl("What is the return policy?");
  assert(result.contains("30 days"));
}

test "TicketSkill creates a ticket with correct format" {
  let result = TicketSkill.impl("Broken widget", "Widget arrived damaged", "high");
  assert(result.contains("TKT-"));
  // Verify the ticket was persisted
  let pending = workspace_read("tickets/pending.jsonl");
  assert(pending != nil);
  assert(pending.contains("Broken widget"));
}

test "lookup_order handles missing orders" {
  let result = lookup_order.impl("ORD-99999");
  assert(result.contains("No order found"));
}

Integration Testing: Full Conversation Flow #

neam
test "full conversation flow with order lookup" {
  SupportBot.reset();
  workspace_write("orders/ORD-78432.json", to_json({
    "order_id": "ORD-78432", "status": "shipped", "eta": "2026-02-20"
  }));

  let r1 = SupportBot.ask("Hi, I need help with my order.");
  assert(r1.contains("order") | r1.contains("help"));

  let r2 = SupportBot.ask("My order number is ORD-78432.");
  assert(r2.contains("ORD-78432") | r2.contains("shipped"));

  let r3 = SupportBot.ask("When will it arrive?");
  assert(r3.contains("Feb") | r3.contains("20"));
  assert(len(SupportBot.history()) >= 6);
}

Testing Session Persistence #

Verify that sessions survive and that idle timeout resets work:

neam
test "session persists across interactions" {
  SupportBot.reset();
  SupportBot.ask("Remember that my name is Alice.");
  let r2 = SupportBot.ask("What is my name?");
  assert(r2.contains("Alice"));
}

test "session resets after idle timeout" {
  SupportBot.reset();
  SupportBot.ask("Remember the code word: pineapple.");
  simulate_idle(SupportBot, 1500);   // 1500 minutes > 1440 idle_reset
  let r2 = SupportBot.ask("What is the code word?");
  assert(!r2.contains("pineapple"));
}
💡 Tip

Always test guard chains as part of integration tests. Send a message containing a fake credit card number and verify that the response does not echo it back.


13 Kubernetes Deployment #

Deploying SupportBot to Kubernetes requires a Deployment, Service, PVCs for sessions and workspace, a ConfigMap for neam.toml, and a HorizontalPodAutoscaler.

Docker Image #

dockerfile
FROM neam/runtime:latest
WORKDIR /app
COPY support_bot.neam .
COPY kb/ .neam/support_workspace/kb/
EXPOSE 8090
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
  CMD curl -f http://localhost:8090/health || exit 1
CMD ["neam-api", "--port", "8090", "--agent", "support_bot.neam"]

Kubernetes Deployment #

The Deployment mounts workspace and session PVCs, loads API keys from Secrets, and configures readiness/liveness probes:

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: support-bot
spec:
  replicas: 3
  selector:
    matchLabels: { app: support-bot }
  template:
    metadata:
      labels: { app: support-bot }
    spec:
      containers:
        - name: support-bot
          image: acme/support-bot:latest
          ports: [{ containerPort: 8090 }]
          volumeMounts:
            - { name: workspace, mountPath: /app/.neam/support_workspace }
            - { name: sessions, mountPath: /app/.neam/sessions }
            - { name: config, mountPath: /app/neam.toml, subPath: neam.toml }
          env:
            - name: ANTHROPIC_API_KEY
              valueFrom:
                secretKeyRef: { name: llm-secrets, key: anthropic-api-key }
          resources:
            requests: { memory: "256Mi", cpu: "250m" }
            limits: { memory: "512Mi", cpu: "500m" }
          readinessProbe:
            httpGet: { path: /health, port: 8090 }
            initialDelaySeconds: 10
            periodSeconds: 10
          livenessProbe:
            httpGet: { path: /health, port: 8090 }
            initialDelaySeconds: 30
            periodSeconds: 30
      volumes:
        - { name: workspace, persistentVolumeClaim: { claimName: support-workspace-pvc } }
        - { name: sessions, persistentVolumeClaim: { claimName: support-sessions-pvc } }
        - { name: config, configMap: { name: support-bot-config } }

PVCs, Service, and ConfigMap #

yaml
# PersistentVolumeClaims
apiVersion: v1
kind: PersistentVolumeClaim
metadata: { name: support-workspace-pvc }
spec:
  accessModes: [ReadWriteMany]
  resources: { requests: { storage: 10Gi } }
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata: { name: support-sessions-pvc }
spec:
  accessModes: [ReadWriteMany]
  resources: { requests: { storage: 5Gi } }
---
# Service
apiVersion: v1
kind: Service
metadata: { name: support-bot-svc }
spec:
  selector: { app: support-bot }
  ports: [{ protocol: TCP, port: 80, targetPort: 8090 }]
  type: ClusterIP
---
# ConfigMap
apiVersion: v1
kind: ConfigMap
metadata: { name: support-bot-config }
data:
  neam.toml: |
    [project]
    name = "support-bot"
    [state]
    backend = "jsonl"
    path = "/app/.neam/sessions"
    [telemetry]
    enabled = true
    exporter = "otlp"
    endpoint = "http://otel-collector:4317"
    [logging]
    level = "info"
    format = "json"

HorizontalPodAutoscaler #

yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: support-bot-hpa }
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: support-bot
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target: { type: Utilization, averageUtilization: 70 }

Docker Compose for Local Development #

For local development, use Docker Compose to replicate the production environment:

yaml
version: "3.8"
services:
  support-bot:
    build: .
    ports: ["8090:8090"]
    volumes:
      - ./workspace:/app/.neam/support_workspace
      - ./sessions:/app/.neam/sessions
      - ./neam.toml:/app/neam.toml
    environment:
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
    restart: unless-stopped
  prometheus:
    image: prom/prometheus:latest
    ports: ["9090:9090"]
    volumes:
      - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml
  grafana:
    image: grafana/grafana:latest
    ports: ["3000:3000"]
    depends_on: [prometheus]
💡 Tip

Use ReadWriteMany access mode for PVCs when running multiple replicas. For production deployments with high write throughput, consider a network filesystem (NFS, EFS) or switching the session backend to PostgreSQL.


14 Monitoring and Observability #

With Monitorable implemented and telemetry enabled, SupportBot exports metrics via a /metrics endpoint:

text
# Agent metrics
neam_agent_requests_total{agent="SupportBot", channel="http"} 3847
neam_agent_response_time_ms{agent="SupportBot", quantile="0.95"} 2340
neam_agent_tool_calls_total{agent="SupportBot", skill="FAQSkill"} 1203

# Session metrics
neam_session_active_count{agent="SupportBot"} 23
neam_session_compaction_total{agent="SupportBot"} 12
neam_session_reset_total{agent="SupportBot", reason="idle"} 89

# Lane metrics
neam_lane_queue_depth{agent="SupportBot", lane="high_priority"} 2
neam_lane_queue_depth{agent="SupportBot", lane="standard"} 15

# Anomaly and budget metrics
neam_anomaly_total{agent="SupportBot", metric="max_response_time_ms"} 7
neam_budget_cost_usd{agent="SupportBot"} 18.42

Alert Rules #

yaml
groups:
  - name: support-bot-alerts
    rules:
      - alert: HighResponseLatency
        expr: neam_agent_response_time_ms{quantile="0.95"} > 5000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "SupportBot p95 latency exceeds 5 seconds"

      - alert: AnomalyRateHigh
        expr: rate(neam_anomaly_total[5m]) > 0.1
        for: 2m
        labels: { severity: critical }
        annotations: { summary: "SupportBot anomaly rate exceeds threshold" }

      - alert: BudgetNearExhaustion
        expr: neam_budget_cost_usd / 50.0 > 0.9
        labels: { severity: warning }
        annotations: { summary: "Daily budget over 90% consumed" }

      - alert: LaneQueueBacklog
        expr: neam_lane_queue_depth{lane="high_priority"} > 10
        for: 3m
        labels: { severity: warning }
        annotations: { summary: "High-priority lane queue depth exceeds 10" }

Grafana Dashboard Recommendations #

Create a Grafana dashboard with six essential panels:

  1. Request Volume (time series) -- rate(neam_agent_requests_total[5m]) by channel.
  2. Response Latency (heatmap) -- neam_agent_response_time_ms p50/p95/p99.
  3. Lane Queue Depth (bar chart) -- neam_lane_queue_depth to spot backlogs.
  4. Budget Consumption (gauge) -- neam_budget_cost_usd / 50.0 * 100 as percentage.
  5. Anomaly Events (log panel) -- Filtered entries from the anomaly log.
  6. Session Activity (gauge) -- neam_session_active_count and compaction rate.
💠 Why This Matters

AI agent failures are often silent -- the agent gives a plausible but incorrect answer, enters a tool-calling loop, or degrades as the session grows. Metrics and alerts are the only way to detect these failure modes before customers notice.


Lessons Learned #

After building and operating the multi-channel support bot, we distilled these lessons:

1. Design channels before skills. Channel configuration determines how messages arrive, how sessions are keyed, and how lanes route requests. Start by diagramming message flow, then build skills to handle it.

2. Use hybrid search for support knowledge bases. The 70/30 hybrid split captures both semantic paraphrases and exact matches on order numbers and product codes. We observed a 23% improvement in answer accuracy over vector-only search.

3. Set idle reset to match your business cycle. A 24-hour idle reset works for business-hours support. For around-the-clock operations, consider 4-8 hours.

4. Compaction is a correctness requirement. Disabling compaction causes silent context loss. Always use compaction: "auto" for agents with long conversations.

5. Lane priorities prevent starvation under load. With a 5/20 concurrency split, DM customers received responses within 2 seconds during peak load, while group messages were served within 8 seconds.

6. Trait composition is additive. Implementing both Schedulable and Monitorable on the same agent works seamlessly. Each trait operates in its own callback cycle without interference. Think of traits as independent operational layers.

7. Guard chains must be channel-agnostic. Apply the same guardrails to every channel. Developers frequently paste real customer data during CLI testing. Uniform guardrails prevent accidental PII exposure in development logs.


Exercises #

Exercise 1: Add a Slack channel. Extend SupportBot with a third channel that receives messages from a Slack webhook. Implement the Channelable trait to customize response format for Slack. Route Slack messages to standard by default, but escalate messages from #urgent-support to high_priority.

Exercise 2: Implement workspace-based analytics. Create an AnalyticsSkill that reads tickets/pending.jsonl, calculates average resolution time, and writes a weekly report to reports/weekly-analytics.md. Schedule it with the on_cron method.

Exercise 3: Add a feedback loop. After each conversation, prompt the customer for a satisfaction rating (1-5). Store ratings in logs/feedback.jsonl. Add an avg_satisfaction_score to the Monitorable baseline and alert when it drops below 3.5.

Exercise 4: Implement session sharing across channels. Modify session key derivation so that a customer with the same user_id shares one session across CLI and HTTP. Test that conversations started on one channel can be continued on the other.

Exercise 5: Build a canary deployment. Create a second claw agent called SupportBotCanary with a different model. Route 10% of HTTP traffic to the canary using Kubernetes Ingress annotations. Implement Monitorable on both agents and compare their baseline metrics to determine which model performs better in production.

Exercise 6: Add multi-language support. Create knowledge base files in Spanish and French (kb/faq-es.md, kb/faq-fr.md). Modify the FAQSkill to detect the customer's language and search the appropriate knowledge base. Verify that hybrid search returns results in the correct language with at least three test queries each.

Start typing to search...