📖 16 min read

Chapter 13: Multi-Agent Orchestration #

"The whole is greater than the sum of its parts." -- Aristotle

In the previous chapters, you learned to create individual agents and equip them with tools. A single agent with the right tools can handle many tasks. But real-world problems often require specialization -- one agent that triages requests, another that handles billing, another that processes refunds. No single agent can be an expert at everything, just as no single employee can fill every role in a company.

Multi-agent orchestration is about coordinating multiple specialized agents to handle complex workflows. In this chapter, you will learn three core mechanisms: handoffs (transferring control between agents), runners (managing the agent execution loop), and orchestration patterns (proven architectures for multi-agent systems).

Why This Matters #

Think of a hospital emergency room. When a patient arrives, a triage nurse assesses the situation and routes the patient to the right specialist. A broken arm goes to the orthopedic surgeon. Chest pain goes to the cardiologist. A skin rash goes to the dermatologist. The triage nurse does not perform surgery, and the cardiologist does not set bones. Each specialist excels because they focus on a narrow domain.

Now imagine a hospital that replaced every specialist with one "general doctor" who handles everything -- bones, hearts, skin, emergencies, mental health. That doctor would be mediocre at all of them. They would constantly context-switch, forget critical details, and make mistakes from overload.

AI agents work the same way. A single agent with the prompt "handle everything" is like that overloaded general doctor. But when you split responsibilities -- a triage agent that classifies, a refund agent that processes returns, a billing agent that handles invoices -- each agent becomes dramatically better at its job. The triage agent can use a tiny, fast model because classification is simple. The billing agent can carry detailed policy documents in its context because that is all it needs to think about.

Multi-agent orchestration is the discipline of building these teams of specialists and giving them clear protocols for working together. By the end of this chapter, you will know 14 proven patterns for doing exactly that.

Why Multi-Agent? #

Consider a customer service system. A single agent with the prompt "You handle all customer inquiries" will produce mediocre results across all categories. But if you split the workload:

A triage agent that classifies requests (low temperature, simple prompt)
A refund specialist with detailed refund policies in its system prompt
A billing specialist trained on billing procedures
A technical support agent with access to documentation tools

Each agent excels at its narrow task. The triage agent's only job is routing -- it does not need to know refund policies. The refund specialist's only job is processing refunds -- it does not need to handle technical issues.

This is the principle of separation of concerns applied to AI agents.

Benefits of Multi-Agent Architecture #

Benefit	Description
Specialization	Each agent has a focused system prompt optimized for its task
Different models	Route simple tasks to cheap models, complex tasks to powerful ones
Maintainability	Change one agent's behavior without affecting others
Testability	Test each agent independently
Cost optimization	Use expensive models only where they add value
Scalability	Add new specialist agents without modifying existing ones

Handoffs: Transferring Control Between Agents #

A handoff is the mechanism by which one agent transfers control to another. In Neam, handoffs are declared in the agent's handoffs field:

neam

agent Triage {
  provider: "ollama"
  model: "llama3.2:3b"
  temperature: 0.3
  system: "Route requests. Reply ROUTE:REFUND, ROUTE:BILLING, or ROUTE:GENERAL."
  handoffs: [RefundAgent, BillingAgent]
}

When an agent has handoffs, the Neam VM automatically registers the handoff targets as callable tools. The agent (through the LLM) decides when to trigger a handoff based on the conversation context.

How Handoffs Work #

The VM sends the prompt to the triage agent along with handoff tools (e.g., transfer_to_RefundAgent, transfer_to_BillingAgent).
The LLM analyzes the user's request and decides which agent should handle it.
The LLM responds with a handoff signal (e.g., HANDOFF: transfer_to_RefundAgent).
The VM detects the handoff, switches the active agent to RefundAgent, and passes the user's original message to it.
RefundAgent processes the request and returns a response.

TriageAgent

(llama3.2)

▼

RefundAgent

(llama3.2)

Simple Handoff Example #

This is the minimal example to understand handoffs:

neam

// First agent - routes to the second
agent RouterAgent {
  provider: "ollama"
  model: "llama3.2:3b"
  temperature: 0.1
  system: "You are a router. Your ONLY job is to hand off to the Responder agent.
Always respond with EXACTLY: HANDOFF: transfer_to_ResponderAgent"

  handoffs: [ResponderAgent]
}

// Second agent - handles the actual response
agent ResponderAgent {
  provider: "ollama"
  model: "llama3.2:3b"
  temperature: 0.7
  system: "You are a helpful assistant. Answer questions concisely."
}

// Runner manages the execution loop
runner SimpleRunner {
  entry_agent: RouterAgent
  max_turns: 3
}

{
  emit "=== Simple Handoff Test ===";
  emit "";

  let result = SimpleRunner.run("Hello, how are you?");

  emit "Final agent: " + result["final_agent"];
  emit "Response: " + result["final_output"];
}

Notice three important elements:

handoffs: [ResponderAgent] -- Declares that RouterAgent can hand off to ResponderAgent.
The system prompt instructs the agent to trigger a handoff.
A runner is required to manage the multi-agent execution loop.

Runners: Orchestration Loops #

A runner is the orchestration engine that manages multi-agent execution. When you have handoffs, something needs to:

Start with the entry agent
Detect when a handoff occurs
Switch to the target agent
Continue until an agent produces a final response (no handoff)
Enforce maximum turn limits

The runner declaration handles all of this:

neam

runner CustomerService {
  entry_agent: TriageAgent
  max_turns: 5
  tracing: enabled
}

Runner Configuration Fields #

Field	Type	Required	Description
`entry_agent`	agent ref	Yes	The first agent that receives the user's input
`max_turns`	int	Yes	Maximum number of agent interactions before forced termination
`tracing`	identifier	No	Enable execution tracing: `enabled` or `disabled`
`input_guardrails`	list	No	Guardrail chains to run on input (Chapter 14)
`output_guardrails`	list	No	Guardrail chains to run on output (Chapter 14)

Running a Runner #

Runners are invoked with the .run() method, which takes the user's input and returns a result map:

neam

{
  let result = CustomerService.run("I was charged twice on my last invoice");

  emit "Final Agent: " + result["final_agent"];
  emit "Response: " + result["final_output"];
  emit "Turns: " + str(result["total_turns"]);
}

Runner Result Fields #

The .run() method returns a map with the following fields:

Field	Type	Description
`final_agent`	string	Name of the agent that produced the final response
`final_output`	string	The final response text
`total_turns`	int	Number of agent turns executed
`completed`	bool	Whether the runner completed normally (not max_turns exceeded)
`total_duration_ms`	int	Total execution time in milliseconds
`error_message`	string	Error description if the runner failed
`trace_summary`	string	Human-readable summary (when tracing is enabled)
`trace`	list	Detailed trace entries (when tracing is enabled)

Turn 1

Active: Triage

Input: user msg

Output: HANDOFF

Turn 2

Active: Billing

Input: user msg

Output: response

Customer Service Example #

This is a complete, production-style customer service system with three specialist agents:

neam

agent TriageAgent {
  provider: "ollama"
  model: "llama3.2:3b"
  temperature: 0.3
  system: "You are a customer service triage agent. Analyze the customer's request
and route to the appropriate specialist.

For billing questions, say HANDOFF_BILLING.
For technical issues, say HANDOFF_TECH.
For refund requests, say HANDOFF_REFUND."

  handoffs: [BillingAgent, TechAgent, RefundAgent]
}

agent BillingAgent {
  provider: "ollama"
  model: "llama3.2:3b"
  temperature: 0.5
  system: "You are a billing specialist. Help customers with billing inquiries,
           payment issues, and invoice questions. Be professional and concise."
}

agent TechAgent {
  provider: "ollama"
  model: "llama3.2:3b"
  temperature: 0.5
  system: "You are a technical support specialist. Help customers with technical
           problems, bugs, and setup issues. Be patient and thorough."
}

agent RefundAgent {
  provider: "ollama"
  model: "llama3.2:3b"
  temperature: 0.5
  system: "You are a refund specialist. Process refund requests professionally.
           Ask for order numbers if not provided. Be empathetic."
}

runner CustomerService {
  entry_agent: TriageAgent
  max_turns: 5
  tracing: enabled
}

{
  emit "=== Customer Service System ===";
  emit "";

  // Test 1: Billing question
  emit "--- Test 1: Billing ---";
  let r1 = CustomerService.run("I was charged twice on my last invoice");
  emit "Routed to: " + r1["final_agent"];
  emit "Response: " + r1["final_output"];
  emit "Turns: " + str(r1["total_turns"]);
  emit "";

  // Test 2: Technical issue
  emit "--- Test 2: Technical ---";
  let r2 = CustomerService.run("My API endpoint is returning 503 errors");
  emit "Routed to: " + r2["final_agent"];
  emit "Response: " + r2["final_output"];
  emit "Turns: " + str(r2["total_turns"]);
  emit "";

  // Test 3: Refund request
  emit "--- Test 3: Refund ---";
  let r3 = CustomerService.run("I want a refund for order #12345, the product was damaged");
  emit "Routed to: " + r3["final_agent"];
  emit "Response: " + r3["final_output"];
  emit "Turns: " + str(r3["total_turns"]);
}

Customer Input

▼

Triage Agent

(classifier)

▼

Response

Advanced Handoff Configuration #

The simple handoffs: [Agent1, Agent2] syntax works for basic routing. For more complex scenarios, Neam supports advanced handoff configuration with handoff_to():

neam

// Input filter function
fun sanitize_input(input) {
  // Remove PII, normalize text, etc.
  return input;
}

// Condition function
fun is_business_hours() {
  // Check if current time is within business hours
  return true;
}

// Callback for logging
fun log_handoff(context) {
  emit "[Handoff Log] Transferring with context: " + context;
}

agent PrimaryAgent {
  provider: "ollama"
  model: "llama3.2:3b"
  temperature: 0.3
  system: "You are a customer service agent. Route requests appropriately.
For urgent issues, respond with: HANDOFF: urgent_support
For billing issues, respond with: HANDOFF: billing_team
For general questions, answer directly."

  handoffs: [
    // Simple handoff (just the agent reference)
    GeneralAgent,

    // Advanced handoff with configuration
    handoff_to(UrgentAgent) {
      tool_name: "urgent_support"
      description: "Escalate urgent issues to immediate support"
      input_filter: sanitize_input
      on_handoff: log_handoff
    },

    // Conditional handoff
    handoff_to(BillingAgent) {
      tool_name: "billing_team"
      description: "Transfer billing inquiries to billing specialists"
      is_enabled: is_business_hours()
    }
  ]
}

Advanced Handoff Configuration Fields #

Field	Type	Description
`tool_name`	string	Custom name for the handoff tool (default: `"transfer_to_<AgentName>"`)
`description`	string	Custom description shown to the LLM
`input_filter`	function ref	Function to transform/sanitize input before handoff
`on_handoff`	function ref	Callback function invoked when handoff occurs
`is_enabled`	bool/function	Condition for whether this handoff is available

Custom Tool Names #

By default, handoff tools are named transfer_to_<AgentName>. Custom tool_name values let you use more natural names:

neam

handoff_to(UrgentAgent) {
  tool_name: "urgent_support"
  description: "Escalate to urgent support team"
}

The LLM will see a tool called urgent_support instead of transfer_to_UrgentAgent, which may lead to more natural handoff decisions.

Input Filters #

Input filters transform the user's message before it reaches the target agent. This is useful for removing sensitive information, adding context, or normalizing the input:

neam

fun add_priority_context(input) {
  return "[PRIORITY: HIGH] " + input;
}

handoff_to(UrgentAgent) {
  tool_name: "escalate"
  input_filter: add_priority_context
}

Conditional Handoffs #

The is_enabled field allows handoffs to be conditionally available:

neam

fun check_billing_hours() {
  // Only enable billing handoff during business hours
  let hour = 14;  // In practice, derive from time_now()
  return (hour >= 9) & (hour <= 17);
}

handoff_to(BillingAgent) {
  tool_name: "billing"
  is_enabled: check_billing_hours()
}

When is_enabled returns false, the handoff tool is not presented to the LLM, and the agent cannot hand off to BillingAgent.

Structured Input with Handoffs #

For handoffs that need specific data (not just the user's message), use the input_type field. This tells the LLM to provide structured JSON data along with the handoff:

neam

agent TriageAgent {
  provider: "openai"
  model: "gpt-4o-mini"
  temperature: 0.3
  system: "Analyze customer requests and route to the right specialist.
           When routing to refund, include the order_id and reason."

  handoffs: [
    handoff_to(RefundAgent) {
      tool_name: "process_refund"
      description: "Process a refund request with order details"
      input_type: RefundRequest
      on_handoff: fun(input) {
        emit "[LOG] Refund request: order=" + input.order_id;
      }
    }
  ]
}

When the LLM triggers this handoff, it provides structured data matching the input_type. The on_handoff callback receives this data, allowing you to log, validate, or transform it before the target agent processes the request.

When an agent hands off to another, the conversation context needs to flow appropriately. The runner automatically passes the conversation history to the target agent, but sometimes you need more control.

Input Filters for Context Transformation #

The input_filter field on a handoff lets you transform the context before it reaches the target agent:

neam

// Remove sensitive data before handoff
fun remove_sensitive(context) {
  // Strip PII, internal notes, etc.
  return context;
}

// Summarize long conversations to save tokens
fun summarize_for_handoff(context) {
  return "Summary of previous conversation: " + context;
}

// Add metadata for tracking
fun add_handoff_metadata(context) {
  return "[Handed off from TriageAgent]\n" + context;
}

Composing Multiple Filters #

You can chain filters together using a composition function:

neam

fun compose_filters(filters) {
  return fn(context) {
    let result = context;
    for (filter_fn in filters) {
      result = filter_fn(result);
    }
    return result;
  };
}

// Apply multiple transformations
let combined = compose_filters([
  remove_sensitive,
  summarize_for_handoff,
  add_handoff_metadata
]);

handoff_to(SpecialistAgent) {
  tool_name: "specialist"
  input_filter: combined
}

Context filters are important for:

Token optimization -- Summarize long histories before passing to expensive models.
Privacy -- Remove PII before handing off to external agents.
Relevance -- Strip tool call details that the target agent does not need.
Auditability -- Add metadata about the handoff chain for tracing.

Tracing: Understanding Multi-Agent Execution #

When debugging multi-agent systems, you need visibility into which agent ran, when handoffs occurred, and how long each turn took. The tracing: enabled option on runners provides this:

neam

agent RouterAgent {
  provider: "ollama"
  model: "llama3.2:3b"
  temperature: 0.3
  system: "Route requests. For greetings, HANDOFF: greet. For questions, HANDOFF: help."

  handoffs: [
    handoff_to(GreeterAgent) {
      tool_name: "greet"
      description: "Handle greetings"
    },
    handoff_to(HelperAgent) {
      tool_name: "help"
      description: "Handle questions"
    }
  ]
}

agent GreeterAgent {
  provider: "ollama"
  model: "llama3.2:3b"
  temperature: 0.5
  system: "You are a friendly greeter. Greet the user warmly."
}

agent HelperAgent {
  provider: "ollama"
  model: "llama3.2:3b"
  temperature: 0.5
  system: "You are a helpful assistant. Answer questions clearly."
}

runner TracedRunner {
  entry_agent: RouterAgent
  max_turns: 5
  tracing: enabled
}

{
  let result = TracedRunner.run("Hello there!");

  emit "=== Result ===";
  emit "Final agent: " + result["final_agent"];
  emit "Total turns: " + str(result["total_turns"]);
  emit "Duration: " + str(result["total_duration_ms"]) + "ms";
  emit "";

  emit "=== Trace Summary ===";
  emit result["trace_summary"];
  emit "";

  emit "=== Detailed Trace ===";
  let trace = result["trace"];
  for (entry in trace) {
    emit "Turn " + str(entry["turn"]) + ":";
    emit "  Agent: " + entry["agent_name"];
    emit "  Action: " + entry["action"];
    emit "  Duration: " + str(entry["duration_ms"]) + "ms";
    if (entry["was_handoff"]) {
      emit "  Handoff to: " + entry["handoff_to"];
    }
  }
}

Orchestration Patterns #

Beyond simple triage routing, there are several proven patterns for multi-agent orchestration. Let us examine each one.

Pattern 1: Triage Routing #

The most common pattern. A classifier agent routes to specialists:

neam

agent Triage {
  provider: "ollama"
  model: "llama3.2:3b"
  temperature: 0.1
  system: "Classify the query. Reply with exactly one: MATH, CODE, or GENERAL."
}

agent MathExpert {
  provider: "ollama"
  model: "llama3.2:3b"
  system: "You are a math expert. Solve problems step by step."
}

agent CodeExpert {
  provider: "ollama"
  model: "llama3.2:3b"
  system: "You are a coding expert. Provide code solutions."
}

fun route_query(query) {
  let category = Triage.ask(query);
  if (category.contains("MATH")) {
    return MathExpert.ask(query);
  }
  if (category.contains("CODE")) {
    return CodeExpert.ask(query);
  }
  return "General: " + query;
}

{
  emit route_query("What is 15 * 23?");
  emit route_query("Write hello world in Python");
}

Pattern 2: Sequential Pipeline #

Output from one agent feeds into the next:

neam

agent Researcher {
  provider: "openai"
  model: "gpt-4o-mini"
  system: "You are a researcher. Provide factual information and key points."
}

agent Writer {
  provider: "openai"
  model: "gpt-4o-mini"
  system: "You are a writer. Take notes and create polished prose."
}

agent Editor {
  provider: "openai"
  model: "gpt-4o-mini"
  system: "You are an editor. Improve text for clarity. Output only improved text."
}

{
  let topic = "artificial intelligence";
  let research = Researcher.ask("Key facts about " + topic);
  let draft = Writer.ask("Write a paragraph based on: " + research);
  let final_text = Editor.ask("Edit and improve: " + draft);
  emit final_text;
}

Pattern 3: Supervisor/Worker #

A supervisor evaluates and validates worker output:

neam

agent Worker {
  provider: "openai"
  model: "gpt-4o-mini"
  system: "Complete assigned tasks thoroughly but concisely."
}

agent Supervisor {
  provider: "openai"
  model: "gpt-4o-mini"
  system: "Evaluate work quality. Reply APPROVED or NEEDS_REVISION: <reason>."
}

fun supervised_task(task) {
  let result = Worker.ask(task);
  let review = Supervisor.ask("Evaluate this response to '" + task + "': " + result);

  if (review.contains("NEEDS_REVISION")) {
    // Worker revises based on feedback
    let revised = Worker.ask(task + "\n\nPrevious feedback: " + review);
    return revised;
  }
  return result;
}

{
  let output = supervised_task("List 5 benefits of exercise with explanations");
  emit output;
}

Pattern 4: Debate/Adversarial #

Multiple agents present different perspectives, then a judge synthesizes:

neam

agent Advocate {
  provider: "openai"
  model: "gpt-4o-mini"
  system: "Argue IN FAVOR of the topic. Present 2-3 strong points."
}

agent Critic {
  provider: "openai"
  model: "gpt-4o-mini"
  system: "Argue AGAINST the topic. Present 2-3 counterpoints."
}

agent Judge {
  provider: "openai"
  model: "gpt-4o-mini"
  system: "You are an impartial judge. Given pro and con arguments,
           provide a balanced conclusion in 2 sentences."
}

{
  let topic = "Remote work should be the default for knowledge workers";

  let pro = Advocate.ask(topic);
  let con = Critic.ask(topic);
  let verdict = Judge.ask("PRO: " + pro + " --- CON: " + con);

  emit "=== Debate: " + topic + " ===";
  emit "";
  emit "FOR: " + pro;
  emit "";
  emit "AGAINST: " + con;
  emit "";
  emit "VERDICT: " + verdict;
}

Pattern 5: Pipeline Chaining with Runners #

For complex multi-step workflows, combine handoffs with runners:

neam

agent Intake {
  provider: "ollama"
  model: "llama3.2:3b"
  temperature: 0.3
  system: "You receive customer requests. Extract the key issue and hand off.
For complaints: HANDOFF: transfer_to_ComplaintHandler
For questions: HANDOFF: transfer_to_QuestionHandler"
  handoffs: [ComplaintHandler, QuestionHandler]
}

agent ComplaintHandler {
  provider: "ollama"
  model: "llama3.2:3b"
  temperature: 0.5
  system: "You handle complaints. Be empathetic. Offer a resolution."
}

agent QuestionHandler {
  provider: "ollama"
  model: "llama3.2:3b"
  temperature: 0.5
  system: "You answer customer questions clearly and helpfully."
}

runner SupportPipeline {
  entry_agent: Intake
  max_turns: 5
  tracing: enabled
}

{
  // Process multiple customer requests through the same pipeline
  let requests = [
    "I'm furious! My order arrived broken!",
    "What are your return policies?",
    "This is unacceptable service!"
  ];

  for (i, request in enumerate(requests)) {
    emit "--- Request " + str(i + 1) + " ---";
    emit "Input: " + request;
    let result = SupportPipeline.run(request);
    emit "Handled by: " + result["final_agent"];
    emit "Response: " + result["final_output"];
    emit "";
  }
}

Pattern 6: Planning Agent #

A planning agent decomposes a complex goal into steps, then coordinates other agents to execute each step:

neam

agent Planner {
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.3
  system: "You decompose goals into numbered steps. Output ONLY a numbered list.
           Each step should be actionable by a single specialist agent."
}

agent Executor {
  provider: "openai"
  model: "gpt-4o-mini"
  temperature: 0.5
  system: "Execute the given task step. Be thorough but concise."
}

agent Monitor {
  provider: "openai"
  model: "gpt-4o-mini"
  temperature: 0.1
  system: "Given a plan and completed steps, respond with one of:
           ON_TRACK, NEEDS_ADJUSTMENT: <reason>, or COMPLETE."
}

fun execute_plan(goal) {
  // Step 1: Decompose the goal
  let plan = Planner.ask("Decompose this goal: " + goal);
  emit "Plan: " + plan;

  // Step 2: Execute each step
  let steps = split(plan, "\n");
  let results = [];

  for (step in steps) {
    if (len(step) > 2) {
      let result = Executor.ask("Execute: " + step);
      push(results, {"step": step, "result": result});
      emit "Completed: " + step;
    }
  }

  // Step 3: Monitor overall progress
  let status = Monitor.ask("Plan: " + plan + "\nResults: " + str(results));
  emit "Status: " + status;

  return {"plan": plan, "results": results, "status": status};
}

{
  let output = execute_plan("Research and write a summary about quantum computing");
  emit "Final status: " + output.status;
}

This pattern separates the concerns of planning, execution, and monitoring into different agents, each with the appropriate temperature and model for its role.

Pattern 7: Deep Search Agent #

A deep search agent breaks complex research questions into sub-queries, searches for each independently, and synthesizes the findings into a comprehensive answer. This is ideal for questions that span multiple domains or require aggregating diverse sources:

neam

agent DeepSearcher {
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.2
  system: "You are a deep research agent. Break complex questions into sub-queries, search for each, and synthesize findings into a comprehensive answer."
  skills: [WebSearch]
}

{
  let answer = DeepSearcher.ask("What are the latest advances in quantum error correction?");
  emit answer;
}

The low temperature (0.2) keeps the search strategy focused and methodical. The WebSearch skill gives the agent access to real-time information, making it suitable for questions about current events, recent research, or rapidly evolving topics.

Pattern 8: Chain-of-Thought Agent #

A chain-of-thought agent is instructed to show its reasoning process before arriving at a conclusion. This makes the agent's logic transparent and auditable, which is especially valuable for math, logic, and analytical tasks:

neam

agent Reasoner {
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.1
  system: "Think step by step. For every question, show your reasoning process before giving a final answer. Format: REASONING: <steps>\nANSWER: <conclusion>"
}

{
  let response = Reasoner.ask("If a train travels 120km in 1.5 hours, and then 80km in 1 hour, what is its average speed for the entire journey?");
  emit response;
}

The very low temperature (0.1) minimizes creative drift and keeps the reasoning precise. The structured output format (REASONING / ANSWER) makes it easy to parse the reasoning chain programmatically if needed.

🎯 Try It Yourself

Modify the Reasoner agent to solve a multi-step logic puzzle, such as: "There are three boxes labeled A, B, and C. Box A contains apples. Box B contains bananas. Box C contains cherries. You swap A and B, then swap B and C. What does each box contain?" Compare the result with and without the chain-of-thought system prompt. Notice how explicit reasoning reduces errors.

Pattern 9: ReAct Agent (Reason + Act) #

The ReAct pattern combines reasoning and acting in an interleaved loop. The agent thinks about what to do (THOUGHT), takes an action using a tool (ACTION), observes the result (OBSERVATION), and repeats until it reaches a final answer:

neam

agent ReActAgent {
  provider: "openai"
  model: "gpt-4o"
  system: "Follow the ReAct pattern: THOUGHT (reason about what to do), ACTION (call a tool), OBSERVATION (analyze results), repeat until you have a final answer."
  skills: [WebSearch, Calculator]
}

{
  let result = ReActAgent.ask("What is the population of France divided by the area of France in square kilometers?");
  emit result;
}

ReAct agents are particularly effective when a task requires both factual lookup (via tools) and mathematical or logical reasoning. The agent naturally alternates between gathering information and processing it, producing more accurate results than either a pure reasoning agent or a pure tool-using agent.

Pattern 10: Self-Reflection Agent #

Self-reflection pairs a Writer agent with a Critic agent in an iterative improvement loop. The Writer produces a draft, the Critic evaluates it and provides feedback, and the Writer revises based on that feedback. This continues for a fixed number of rounds or until the Critic approves:

neam

agent Writer {
  provider: "openai"
  model: "gpt-4o-mini"
  system: "Write content as requested."
}

agent Critic {
  provider: "openai"
  model: "gpt-4o"
  system: "Critique the given text. Identify weaknesses and suggest specific improvements."
}

fun reflect_and_improve(prompt, max_rounds) {
  let draft = Writer.ask(prompt);
  let round = 0;
  while (round < max_rounds) {
    let critique = Critic.ask(f"Critique this: {draft}");
    if (critique.contains("APPROVED")) { return draft; }
    draft = Writer.ask(f"{prompt}\n\nPrevious draft: {draft}\nFeedback: {critique}");
    round = round + 1;
  }
  return draft;
}

{
  let final_essay = reflect_and_improve("Write a 200-word essay on climate change", 3);
  emit final_essay;
}

Notice that the Writer uses a cheaper model (gpt-4o-mini) while the Critic uses a more capable model (gpt-4o). This is intentional: writing is a generative task that benefits from iteration, while critique requires stronger analytical reasoning. The cost-per-round stays low because only the Critic needs the expensive model.

Pattern 11: Socratic Agent #

A Socratic agent never gives direct answers. Instead, it asks probing questions that guide the student toward discovering the answer themselves. This is powerful for educational applications and tutoring systems:

neam

agent Socratic {
  provider: "openai"
  model: "gpt-4o"
  system: "You are a Socratic tutor. Never give direct answers. Instead, ask probing questions that guide the student to discover the answer themselves. Ask at most 2 questions per response."
}

{
  let q1 = Socratic.ask("What causes the seasons on Earth?");
  emit "Tutor: " + q1;

  let q2 = Socratic.ask("I think it's because the Earth is closer to the sun in summer?");
  emit "Tutor: " + q2;

  let q3 = Socratic.ask("Hmm, so maybe it's about the tilt of the Earth's axis?");
  emit "Tutor: " + q3;
}

The Socratic agent is a single-agent pattern, but it pairs well with multi-agent architectures. For example, you could route students to different Socratic tutors depending on the subject (math, history, science), each with domain-specific probing strategies in their system prompts.

Pattern 12: Red Team / Blue Team #

The Red Team / Blue Team pattern uses adversarial collaboration for security analysis, risk assessment, and robustness testing. The Red Team agent attacks -- finding vulnerabilities, edge cases, and potential failures. The Blue Team agent defends -- proposing mitigations and countermeasures for each finding:

neam

agent RedTeam {
  provider: "openai"
  model: "gpt-4o"
  system: "You are a red team agent. Find vulnerabilities, edge cases, and potential failures in the given system/plan."
}

agent BlueTeam {
  provider: "openai"
  model: "gpt-4o"
  system: "You are a blue team agent. Given red team findings, propose mitigations and defensive measures."
}

{
  let system_desc = "A web API that accepts user-uploaded images, resizes them, and stores them in S3.";

  let vulnerabilities = RedTeam.ask("Analyze this system for vulnerabilities: " + system_desc);
  emit "=== Red Team Findings ===";
  emit vulnerabilities;
  emit "";

  let defenses = BlueTeam.ask("Propose mitigations for these findings:\n" + vulnerabilities);
  emit "=== Blue Team Mitigations ===";
  emit defenses;
}

This pattern is similar to Debate/Adversarial (Pattern 4) but is specifically designed for security and robustness contexts. You can extend it with multiple rounds -- the Red Team tries to bypass the Blue Team's mitigations, and the Blue Team strengthens its defenses iteratively.

Pattern 13: Memory-Enhanced Agent #

A memory-enhanced agent retains information across conversations using persistent storage. The memory field connects the agent to a conversation store, allowing it to recall previous interactions and build context over time:

neam

agent MemoryBot {
  provider: "openai"
  model: "gpt-4o-mini"
  system: "You are a helpful assistant with persistent memory."
  memory: "conversation_store"
}

{
  // First interaction
  let r1 = MemoryBot.ask("My name is Alice and I'm working on a Neam project.");
  emit r1;

  // Later interaction -- the agent remembers
  let r2 = MemoryBot.ask("What project am I working on?");
  emit r2;
}

The memory field specifies the name of a persistent store. Across separate runs of the program, the agent can recall facts from earlier conversations. This is essential for personal assistants, long-running tutoring sessions, and any application where continuity matters.

🎯 Try It Yourself

Combine the Memory-Enhanced Agent with the Triage Routing pattern. Create a system where a triage agent routes users to specialists, but each specialist has its own memory store (e.g., "billing_memory", "tech_memory"). This way, the billing specialist remembers past billing conversations and the tech specialist remembers past technical issues -- even across separate sessions.

Pattern 14: Expert Retrieval (Multi-Knowledge) #

Expert retrieval connects specialized agents to domain-specific knowledge bases using Neam's knowledge system. Each agent retrieves from its own curated set of documents, producing answers grounded in authoritative sources:

neam

knowledge LegalDocs { vector_store: "usearch", sources: [{ type: "file", path: "./legal/*.txt" }] }
knowledge TechDocs { vector_store: "usearch", sources: [{ type: "file", path: "./tech/*.md" }] }

agent LegalExpert {
  provider: "openai"
  model: "gpt-4o"
  system: "You are a legal expert."
  connected_knowledge: [LegalDocs]
}

agent TechExpert {
  provider: "openai"
  model: "gpt-4o"
  system: "You are a technical expert."
  connected_knowledge: [TechDocs]
}

agent Dispatcher {
  provider: "openai"
  model: "gpt-4o-mini"
  temperature: 0.1
  system: "Classify the user's question as LEGAL or TECHNICAL. Respond with exactly one word."
}

fun expert_answer(question) {
  let category = Dispatcher.ask(question);
  if (category.contains("LEGAL")) {
    return LegalExpert.ask(question);
  }
  return TechExpert.ask(question);
}

{
  emit expert_answer("What are the GDPR requirements for data retention?");
  emit expert_answer("How do I configure a reverse proxy with nginx?");
}

The connected_knowledge field gives each agent access to a vector store built from domain-specific files. When the agent receives a question, it retrieves relevant chunks from its knowledge base and includes them in the LLM context. This is retrieval-augmented generation (RAG) at the agent level -- each expert only sees documents from its own domain, which reduces noise and hallucination.

Pattern Selection Guide #

With 14 patterns available, choosing the right one can feel overwhelming. Use this table as a quick reference:

Pattern	Best For	Complexity	Example Use Case
1. Triage Routing	Classification + dispatch	Low	Customer service routing
2. Sequential Pipeline	Content creation workflow	Low	Research, Write, Edit
3. Supervisor/Worker	Quality assurance	Medium	Code review, content approval
4. Debate/Adversarial	Balanced analysis	Medium	Policy evaluation
5. Pipeline Chaining	Multi-step with handoffs	Medium	Support ticket processing
6. Planning Agent	Complex multi-step tasks	High	Project management
7. Deep Search	Research questions	Medium	Market analysis, literature review
8. Chain-of-Thought	Reasoning tasks	Low	Math, logic puzzles
9. ReAct	Tool-using reasoning	Medium	Data analysis with live search
10. Self-Reflection	Iterative improvement	Medium	Writing, code generation
11. Socratic	Education / tutoring	Low	Student learning
12. Red/Blue Team	Security analysis	Medium	Threat modeling
13. Memory-Enhanced	Ongoing conversations	Medium	Personal assistant
14. Expert Retrieval	Domain Q&A with sources	Medium	Legal/medical consultation

How to read this table:

Low complexity patterns require 1--2 agents and minimal control flow. Start here.
Medium complexity patterns require 2--4 agents with loops or conditional logic.
High complexity patterns require 3+ agents with planning, monitoring, and dynamic dispatch.

When in doubt, start with Pattern 1 (Triage Routing) and evolve toward more sophisticated patterns only when the simpler approach falls short.

Manual Routing vs. Runner-Based Handoffs #

You have seen two approaches to multi-agent coordination:

Manual routing (Pattern 1-4): Your code explicitly decides which agent to call based on classifier output. You control the flow with if statements.
Runner-based handoffs (Pattern 5): The runner manages the flow. The LLM decides when to hand off based on the conversation and available handoff tools.

Aspect	Manual Routing	Runner-Based Handoffs
Control	You decide the routing logic	LLM decides when to hand off
Flexibility	Full programmatic control	More adaptive to novel inputs
Predictability	Highly predictable	LLM may route unexpectedly
Complexity	More code to write	Less code, more configuration
Debugging	Easy to trace (explicit flow)	Use tracing for visibility
Best for	Deterministic workflows	Conversational, open-ended flows

In practice, production systems often combine both approaches -- using runners for the outer orchestration loop and manual routing for specific decision points within agents.

Designing Multi-Agent Systems #

When designing a multi-agent system, follow these guidelines:

Start with a single agent. Only split into multiple agents when a single agent's system prompt becomes unwieldy or performance degrades.
Give each agent one clear responsibility. The system prompt should be describable in one sentence.
Use the cheapest model that works. Triage agents that just classify can use small, fast models. Only use expensive models for tasks that require strong reasoning.
Set conservative max_turns. For simple routing, max_turns: 3 is enough. Only increase for complex multi-hop workflows.
Always enable tracing during development. Turn it off in production if the performance overhead is a concern.
Test each agent independently before composing them into a multi-agent system.

The `spawn` Keyword #

The spawn keyword lets you invoke another agent by name from within your code. Unlike handoffs, which transfer control through a runner, spawn performs a direct, single-turn LLM call to the named agent and returns the result:

neam

// Keyword form — static agent name
let result = spawn researcher("Find recent papers on RAG");

// Native function form — dynamic agent name
let agent_name = "researcher";
let result = spawn(agent_name, "Find recent papers on RAG");

When you call spawn, the runtime resolves the agent name by checking claw agents first, then forge agents, then stateless agents. If no agent is found, it returns an error string.

1. Check claw_agents_

2. Check forge_agents_

3. Check globals_ (legacy)

4. Not found

📝 Note

spawn performs a simplified single-turn LLM call. It does NOT invoke the full .ask() session flow for claw agents or the .run() loop for forge agents. If you need session-aware conversation or full forge iteration, call agent.ask() or agent.run() directly.

Spawn with Orchestrable Callbacks #

When the spawning agent implements the Orchestrable trait (Chapter 27), callbacks fire automatically before and after the sub-agent call:

📁spawn researcher("task")

│📁

├── 📄Orchestrable.on_spawn("researcher") ← before LLM call

│📁

├── 📄LLM completes...

│📁

└── 📄Orchestrable.on_delegate(result) ← after result

This is useful for logging, metrics collection, and access control on sub-agent invocations.

Spawn Example #

neam

agent researcher {
  provider: "openai"
  model: "gpt-4o"
  system: "You are a research assistant. Provide concise summaries of topics."
}

agent writer {
  provider: "openai"
  model: "gpt-4o"
  system: "You are a technical writer. Create clear, structured reports."
}

{
  // Gather research
  let findings = spawn researcher("Summarize recent trends in AI agent frameworks");

  // Write report based on research
  let report = spawn writer("Write a 200-word summary based on: " + findings);

  emit report;
}

DAG Execution with `dag_execute()` #

For workflows where multiple agents need to execute in a specific dependency order, Neam provides the dag_execute() native function. It takes a list of DAG (Directed Acyclic Graph) nodes, topologically sorts them, and executes each agent in order:

neam

let results = dag_execute([
  {
    "id": "research",
    "agent": "researcher",
    "task": "Gather data on market trends",
    "depends_on": []
  },
  {
    "id": "analysis",
    "agent": "analyst",
    "task": "Analyze the research findings",
    "depends_on": ["research"]
  },
  {
    "id": "report",
    "agent": "writer",
    "task": "Write executive summary",
    "depends_on": ["research", "analysis"]
  }
]);

// results is a Map: { "research": "...", "analysis": "...", "report": "..." }
emit results["report"];

DAG Node Fields #

Field	Type	Required	Description
`id`	String	Yes	Unique node identifier
`agent`	String	Yes	Name of the agent to invoke via `spawn`
`task`	String	No	Task description passed to the agent
`depends_on`	List of strings	No	IDs of nodes that must complete first

How DAG Execution Works #

The runtime uses Kahn's topological sort to determine the correct execution order. Nodes with no unresolved dependencies are eligible to execute. Each node invokes the named agent via spawn:

Kahn's Topological Sort

research ──▶ analysis ──▶ report

└───────────────────────┘

DAG Patterns #

Two common DAG patterns emerge in practice:

Fan-Out / Fan-In: Multiple independent agents execute in parallel, then a synthesis agent combines their results:

neam

let results = dag_execute([
  { "id": "finance",  "agent": "finance_expert",  "task": query, "depends_on": [] },
  { "id": "legal",    "agent": "legal_expert",    "task": query, "depends_on": [] },
  { "id": "tech",     "agent": "tech_expert",     "task": query, "depends_on": [] },
  {
    "id": "synthesis",
    "agent": "synthesizer",
    "task": "Combine expert opinions into a unified recommendation",
    "depends_on": ["finance", "legal", "tech"]
  }
]);

Sequential Chain: Each agent depends on the previous, forming a linear pipeline:

neam

let results = dag_execute([
  { "id": "step1", "agent": "extractor",   "task": "Extract key data",       "depends_on": [] },
  { "id": "step2", "agent": "transformer", "task": "Transform to schema",    "depends_on": ["step1"] },
  { "id": "step3", "agent": "validator",   "task": "Validate output format", "depends_on": ["step2"] }
]);

🎯 Try It Yourself

Build a DAG with four agents: a researcher that gathers data, a fact_checker that verifies the research (depends on researcher), a writer that drafts a report (depends on researcher), and an editor that polishes the final output (depends on both fact_checker and writer). Run it with dag_execute() and emit the editor's output.

Summary #

In this chapter, you learned:

Multi-agent orchestration enables specialization, cost optimization, and maintainability through agent separation of concerns.
Handoffs transfer control between agents using the handoffs field and are implemented as tools that the LLM can invoke.
Advanced handoff configuration supports custom tool names, descriptions, input filters, callbacks, conditional enablement, and structured input types.
Context sharing between agents is automatic through the runner, with input filters for transforming, summarizing, and securing context during handoffs.
Runners manage the multi-agent execution loop with configurable max turns, tracing, and guardrails.
Runner .run() returns a result map with final_agent, final_output, total_turns, and trace data.
Fourteen orchestration patterns: triage routing, sequential pipelines, supervisor/worker, debate/adversarial, pipeline chaining, planning agents, deep search, chain-of-thought, ReAct (reason + act), self-reflection, Socratic tutoring, red team / blue team, memory-enhanced agents, and expert retrieval with multi-knowledge bases.
Manual routing gives you explicit control; runner-based handoffs let the LLM decide.
The Pattern Selection Guide helps you choose the right pattern based on your use case, ranging from low-complexity single-dispatch patterns to high-complexity planning architectures.
The spawn keyword invokes another agent by name for a single-turn LLM call, with automatic callback hooks when the spawning agent implements Orchestrable.
The dag_execute() function executes a directed acyclic graph of agent tasks with dependency ordering using Kahn's topological sort.
Two common DAG patterns are fan-out/fan-in (parallel experts with synthesis) and sequential chain (linear pipeline).

In the next chapter, we will add safety layers to these multi-agent systems with guardrails.

Exercises #

Exercise 13.1: Three-Agent Router Create a triage system with three specialist agents: MathExpert, HistoryExpert, and ScienceExpert. Write a triage agent that classifies questions into one of the three categories and routes accordingly. Test with at least one question per category.

Exercise 13.2: Review Pipeline Build a three-stage pipeline: Drafter writes an initial response, Reviewer provides critique, and Reviser incorporates the feedback. Run it for the prompt "Explain the water cycle" and emit the output at each stage.

Exercise 13.3: Runner with Tracing Create a runner-based system with a triage agent and two specialists. Enable tracing. After running a query, iterate through the trace entries and emit a formatted log showing which agent ran at each turn and whether a handoff occurred.

Exercise 13.4: Conditional Handoffs Implement an agent with three handoff targets. Make one handoff always enabled, one conditionally enabled based on a function, and one always disabled. Test by observing which agents the triage can and cannot hand off to.

Exercise 13.5: Debate System Build a debate system with Proponent, Opponent, and Moderator agents. The Moderator should receive arguments from both sides and produce a balanced summary. Run a debate on the topic "AI should be regulated by governments" and emit all three outputs.

Exercise 13.6: Multi-Provider Multi-Agent Create a multi-agent system where the triage agent uses Ollama (free, fast classification), the billing specialist uses GPT-4o-mini (cost-effective), and the technical specialist uses GPT-4o (maximum capability). Include comments explaining your provider choices.

Exercise 13.7: Pipeline with Error Handling Build a runner-based pipeline where one of the specialist agents might fail (simulate by using a nonexistent model). Wrap the Runner.run() call in try/catch and emit appropriate error messages. Also check the completed field in the result map.

Exercise 13.8: Spawn Chain Create three agents (translator, summarizer, formatter) using the spawn keyword. The main program should spawn the translator with a paragraph of text, spawn the summarizer with the translation result, and spawn the formatter with the summary. Emit the result at each stage.

Exercise 13.9: DAG Orchestration Build a four-agent DAG using dag_execute(): a data_collector (no dependencies), a validator (depends on data_collector), a analyzer (depends on data_collector), and a report_writer (depends on both validator and analyzer). Run the DAG and emit each agent's result. Verify that the execution order respects the dependency graph.

Exercise 13.10: Self-Reflection Loop Implement the Self-Reflection pattern (Pattern 10): a Writer agent produces a first draft on any topic of your choice, a Critic agent evaluates it and provides specific feedback, and the Writer revises based on the feedback. Limit the loop to 3 rounds maximum. Use emit to output the draft at each stage so you can observe how the text improves with each iteration. Bonus: add a round counter to each emitted draft (e.g., "=== Round 1 Draft ===" / "=== Round 2 Draft ===" / "=== Final Draft ===").

Chapter 13: Multi-Agent Orchestration #

Why This Matters #

Why Multi-Agent? #

Benefits of Multi-Agent Architecture #

Handoffs: Transferring Control Between Agents #

How Handoffs Work #

Simple Handoff Example #

Runners: Orchestration Loops #

Runner Configuration Fields #

Running a Runner #

Runner Result Fields #

Customer Service Example #

Advanced Handoff Configuration #

Advanced Handoff Configuration Fields #

Custom Tool Names #

Input Filters #

Conditional Handoffs #

Structured Input with Handoffs #

Context Sharing Between Agents #

Input Filters for Context Transformation #

Composing Multiple Filters #

Tracing: Understanding Multi-Agent Execution #

Orchestration Patterns #

Pattern 1: Triage Routing #

Pattern 2: Sequential Pipeline #

Pattern 3: Supervisor/Worker #

Pattern 4: Debate/Adversarial #

Pattern 5: Pipeline Chaining with Runners #

Pattern 6: Planning Agent #

Pattern 7: Deep Search Agent #

Pattern 8: Chain-of-Thought Agent #

Pattern 9: ReAct Agent (Reason + Act) #

Pattern 10: Self-Reflection Agent #

Pattern 11: Socratic Agent #

Pattern 12: Red Team / Blue Team #

Pattern 13: Memory-Enhanced Agent #

Pattern 14: Expert Retrieval (Multi-Knowledge) #

Pattern Selection Guide #

Manual Routing vs. Runner-Based Handoffs #

Designing Multi-Agent Systems #

The spawn Keyword #

Spawn with Orchestrable Callbacks #

Spawn Example #

DAG Execution with dag_execute() #

DAG Node Fields #

How DAG Execution Works #

DAG Patterns #

Summary #

Exercises #

The `spawn` Keyword #

DAG Execution with `dag_execute()` #