Chapter 13: Multi-Agent Orchestration #
"The whole is greater than the sum of its parts." -- Aristotle
In the previous chapters, you learned to create individual agents and equip them with tools. A single agent with the right tools can handle many tasks. But real-world problems often require specialization -- one agent that triages requests, another that handles billing, another that processes refunds. No single agent can be an expert at everything, just as no single employee can fill every role in a company.
Multi-agent orchestration is about coordinating multiple specialized agents to handle complex workflows. In this chapter, you will learn three core mechanisms: handoffs (transferring control between agents), runners (managing the agent execution loop), and orchestration patterns (proven architectures for multi-agent systems).
Why This Matters #
Think of a hospital emergency room. When a patient arrives, a triage nurse assesses the situation and routes the patient to the right specialist. A broken arm goes to the orthopedic surgeon. Chest pain goes to the cardiologist. A skin rash goes to the dermatologist. The triage nurse does not perform surgery, and the cardiologist does not set bones. Each specialist excels because they focus on a narrow domain.
Now imagine a hospital that replaced every specialist with one "general doctor" who handles everything -- bones, hearts, skin, emergencies, mental health. That doctor would be mediocre at all of them. They would constantly context-switch, forget critical details, and make mistakes from overload.
AI agents work the same way. A single agent with the prompt "handle everything" is like that overloaded general doctor. But when you split responsibilities -- a triage agent that classifies, a refund agent that processes returns, a billing agent that handles invoices -- each agent becomes dramatically better at its job. The triage agent can use a tiny, fast model because classification is simple. The billing agent can carry detailed policy documents in its context because that is all it needs to think about.
Multi-agent orchestration is the discipline of building these teams of specialists and giving them clear protocols for working together. By the end of this chapter, you will know 14 proven patterns for doing exactly that.
Why Multi-Agent? #
Consider a customer service system. A single agent with the prompt "You handle all customer inquiries" will produce mediocre results across all categories. But if you split the workload:
- A triage agent that classifies requests (low temperature, simple prompt)
- A refund specialist with detailed refund policies in its system prompt
- A billing specialist trained on billing procedures
- A technical support agent with access to documentation tools
Each agent excels at its narrow task. The triage agent's only job is routing -- it does not need to know refund policies. The refund specialist's only job is processing refunds -- it does not need to handle technical issues.
This is the principle of separation of concerns applied to AI agents.
Benefits of Multi-Agent Architecture #
| Benefit | Description |
|---|---|
| Specialization | Each agent has a focused system prompt optimized for its task |
| Different models | Route simple tasks to cheap models, complex tasks to powerful ones |
| Maintainability | Change one agent's behavior without affecting others |
| Testability | Test each agent independently |
| Cost optimization | Use expensive models only where they add value |
| Scalability | Add new specialist agents without modifying existing ones |
Handoffs: Transferring Control Between Agents #
A handoff is the mechanism by which one agent transfers control to another. In Neam,
handoffs are declared in the agent's handoffs field:
agent Triage {
provider: "ollama"
model: "llama3.2:3b"
temperature: 0.3
system: "Route requests. Reply ROUTE:REFUND, ROUTE:BILLING, or ROUTE:GENERAL."
handoffs: [RefundAgent, BillingAgent]
}
When an agent has handoffs, the Neam VM automatically registers the handoff targets
as callable tools. The agent (through the LLM) decides when to trigger a handoff based
on the conversation context.
How Handoffs Work #
- The VM sends the prompt to the triage agent along with handoff tools
(e.g.,
transfer_to_RefundAgent,transfer_to_BillingAgent). - The LLM analyzes the user's request and decides which agent should handle it.
- The LLM responds with a handoff signal (e.g.,
HANDOFF: transfer_to_RefundAgent). - The VM detects the handoff, switches the active agent to
RefundAgent, and passes the user's original message to it. RefundAgentprocesses the request and returns a response.
Simple Handoff Example #
This is the minimal example to understand handoffs:
// First agent - routes to the second
agent RouterAgent {
provider: "ollama"
model: "llama3.2:3b"
temperature: 0.1
system: "You are a router. Your ONLY job is to hand off to the Responder agent.
Always respond with EXACTLY: HANDOFF: transfer_to_ResponderAgent"
handoffs: [ResponderAgent]
}
// Second agent - handles the actual response
agent ResponderAgent {
provider: "ollama"
model: "llama3.2:3b"
temperature: 0.7
system: "You are a helpful assistant. Answer questions concisely."
}
// Runner manages the execution loop
runner SimpleRunner {
entry_agent: RouterAgent
max_turns: 3
}
{
emit "=== Simple Handoff Test ===";
emit "";
let result = SimpleRunner.run("Hello, how are you?");
emit "Final agent: " + result["final_agent"];
emit "Response: " + result["final_output"];
}
Notice three important elements:
handoffs: [ResponderAgent]-- Declares thatRouterAgentcan hand off toResponderAgent.- The system prompt instructs the agent to trigger a handoff.
- A
runneris required to manage the multi-agent execution loop.
Runners: Orchestration Loops #
A runner is the orchestration engine that manages multi-agent execution. When you have handoffs, something needs to:
- Start with the entry agent
- Detect when a handoff occurs
- Switch to the target agent
- Continue until an agent produces a final response (no handoff)
- Enforce maximum turn limits
The runner declaration handles all of this:
runner CustomerService {
entry_agent: TriageAgent
max_turns: 5
tracing: enabled
}
Runner Configuration Fields #
| Field | Type | Required | Description |
|---|---|---|---|
entry_agent |
agent ref | Yes | The first agent that receives the user's input |
max_turns |
int | Yes | Maximum number of agent interactions before forced termination |
tracing |
identifier | No | Enable execution tracing: enabled or disabled |
input_guardrails |
list | No | Guardrail chains to run on input (Chapter 14) |
output_guardrails |
list | No | Guardrail chains to run on output (Chapter 14) |
Running a Runner #
Runners are invoked with the .run() method, which takes the user's input and returns
a result map:
{
let result = CustomerService.run("I was charged twice on my last invoice");
emit "Final Agent: " + result["final_agent"];
emit "Response: " + result["final_output"];
emit "Turns: " + str(result["total_turns"]);
}
Runner Result Fields #
The .run() method returns a map with the following fields:
| Field | Type | Description |
|---|---|---|
final_agent |
string | Name of the agent that produced the final response |
final_output |
string | The final response text |
total_turns |
int | Number of agent turns executed |
completed |
bool | Whether the runner completed normally (not max_turns exceeded) |
total_duration_ms |
int | Total execution time in milliseconds |
error_message |
string | Error description if the runner failed |
trace_summary |
string | Human-readable summary (when tracing is enabled) |
trace |
list | Detailed trace entries (when tracing is enabled) |
Customer Service Example #
This is a complete, production-style customer service system with three specialist agents:
agent TriageAgent {
provider: "ollama"
model: "llama3.2:3b"
temperature: 0.3
system: "You are a customer service triage agent. Analyze the customer's request
and route to the appropriate specialist.
For billing questions, say HANDOFF_BILLING.
For technical issues, say HANDOFF_TECH.
For refund requests, say HANDOFF_REFUND."
handoffs: [BillingAgent, TechAgent, RefundAgent]
}
agent BillingAgent {
provider: "ollama"
model: "llama3.2:3b"
temperature: 0.5
system: "You are a billing specialist. Help customers with billing inquiries,
payment issues, and invoice questions. Be professional and concise."
}
agent TechAgent {
provider: "ollama"
model: "llama3.2:3b"
temperature: 0.5
system: "You are a technical support specialist. Help customers with technical
problems, bugs, and setup issues. Be patient and thorough."
}
agent RefundAgent {
provider: "ollama"
model: "llama3.2:3b"
temperature: 0.5
system: "You are a refund specialist. Process refund requests professionally.
Ask for order numbers if not provided. Be empathetic."
}
runner CustomerService {
entry_agent: TriageAgent
max_turns: 5
tracing: enabled
}
{
emit "=== Customer Service System ===";
emit "";
// Test 1: Billing question
emit "--- Test 1: Billing ---";
let r1 = CustomerService.run("I was charged twice on my last invoice");
emit "Routed to: " + r1["final_agent"];
emit "Response: " + r1["final_output"];
emit "Turns: " + str(r1["total_turns"]);
emit "";
// Test 2: Technical issue
emit "--- Test 2: Technical ---";
let r2 = CustomerService.run("My API endpoint is returning 503 errors");
emit "Routed to: " + r2["final_agent"];
emit "Response: " + r2["final_output"];
emit "Turns: " + str(r2["total_turns"]);
emit "";
// Test 3: Refund request
emit "--- Test 3: Refund ---";
let r3 = CustomerService.run("I want a refund for order #12345, the product was damaged");
emit "Routed to: " + r3["final_agent"];
emit "Response: " + r3["final_output"];
emit "Turns: " + str(r3["total_turns"]);
}
Advanced Handoff Configuration #
The simple handoffs: [Agent1, Agent2] syntax works for basic routing. For more
complex scenarios, Neam supports advanced handoff configuration with handoff_to():
// Input filter function
fun sanitize_input(input) {
// Remove PII, normalize text, etc.
return input;
}
// Condition function
fun is_business_hours() {
// Check if current time is within business hours
return true;
}
// Callback for logging
fun log_handoff(context) {
emit "[Handoff Log] Transferring with context: " + context;
}
agent PrimaryAgent {
provider: "ollama"
model: "llama3.2:3b"
temperature: 0.3
system: "You are a customer service agent. Route requests appropriately.
For urgent issues, respond with: HANDOFF: urgent_support
For billing issues, respond with: HANDOFF: billing_team
For general questions, answer directly."
handoffs: [
// Simple handoff (just the agent reference)
GeneralAgent,
// Advanced handoff with configuration
handoff_to(UrgentAgent) {
tool_name: "urgent_support"
description: "Escalate urgent issues to immediate support"
input_filter: sanitize_input
on_handoff: log_handoff
},
// Conditional handoff
handoff_to(BillingAgent) {
tool_name: "billing_team"
description: "Transfer billing inquiries to billing specialists"
is_enabled: is_business_hours()
}
]
}
Advanced Handoff Configuration Fields #
| Field | Type | Description |
|---|---|---|
tool_name |
string | Custom name for the handoff tool (default: "transfer_to_<AgentName>") |
description |
string | Custom description shown to the LLM |
input_filter |
function ref | Function to transform/sanitize input before handoff |
on_handoff |
function ref | Callback function invoked when handoff occurs |
is_enabled |
bool/function | Condition for whether this handoff is available |
Custom Tool Names #
By default, handoff tools are named transfer_to_<AgentName>. Custom tool_name values
let you use more natural names:
handoff_to(UrgentAgent) {
tool_name: "urgent_support"
description: "Escalate to urgent support team"
}
The LLM will see a tool called urgent_support instead of transfer_to_UrgentAgent,
which may lead to more natural handoff decisions.
Input Filters #
Input filters transform the user's message before it reaches the target agent. This is useful for removing sensitive information, adding context, or normalizing the input:
fun add_priority_context(input) {
return "[PRIORITY: HIGH] " + input;
}
handoff_to(UrgentAgent) {
tool_name: "escalate"
input_filter: add_priority_context
}
Conditional Handoffs #
The is_enabled field allows handoffs to be conditionally available:
fun check_billing_hours() {
// Only enable billing handoff during business hours
let hour = 14; // In practice, derive from time_now()
return (hour >= 9) & (hour <= 17);
}
handoff_to(BillingAgent) {
tool_name: "billing"
is_enabled: check_billing_hours()
}
When is_enabled returns false, the handoff tool is not presented to the LLM, and the
agent cannot hand off to BillingAgent.
Structured Input with Handoffs #
For handoffs that need specific data (not just the user's message), use the input_type
field. This tells the LLM to provide structured JSON data along with the handoff:
agent TriageAgent {
provider: "openai"
model: "gpt-4o-mini"
temperature: 0.3
system: "Analyze customer requests and route to the right specialist.
When routing to refund, include the order_id and reason."
handoffs: [
handoff_to(RefundAgent) {
tool_name: "process_refund"
description: "Process a refund request with order details"
input_type: RefundRequest
on_handoff: fun(input) {
emit "[LOG] Refund request: order=" + input.order_id;
}
}
]
}
When the LLM triggers this handoff, it provides structured data matching the
input_type. The on_handoff callback receives this data, allowing you to log,
validate, or transform it before the target agent processes the request.
Context Sharing Between Agents #
When an agent hands off to another, the conversation context needs to flow appropriately. The runner automatically passes the conversation history to the target agent, but sometimes you need more control.
Input Filters for Context Transformation #
The input_filter field on a handoff lets you transform the context before it reaches
the target agent:
// Remove sensitive data before handoff
fun remove_sensitive(context) {
// Strip PII, internal notes, etc.
return context;
}
// Summarize long conversations to save tokens
fun summarize_for_handoff(context) {
return "Summary of previous conversation: " + context;
}
// Add metadata for tracking
fun add_handoff_metadata(context) {
return "[Handed off from TriageAgent]\n" + context;
}
Composing Multiple Filters #
You can chain filters together using a composition function:
fun compose_filters(filters) {
return fn(context) {
let result = context;
for (filter_fn in filters) {
result = filter_fn(result);
}
return result;
};
}
// Apply multiple transformations
let combined = compose_filters([
remove_sensitive,
summarize_for_handoff,
add_handoff_metadata
]);
handoff_to(SpecialistAgent) {
tool_name: "specialist"
input_filter: combined
}
Context filters are important for:
- Token optimization -- Summarize long histories before passing to expensive models.
- Privacy -- Remove PII before handing off to external agents.
- Relevance -- Strip tool call details that the target agent does not need.
- Auditability -- Add metadata about the handoff chain for tracing.
Tracing: Understanding Multi-Agent Execution #
When debugging multi-agent systems, you need visibility into which agent ran, when
handoffs occurred, and how long each turn took. The tracing: enabled option on runners
provides this:
agent RouterAgent {
provider: "ollama"
model: "llama3.2:3b"
temperature: 0.3
system: "Route requests. For greetings, HANDOFF: greet. For questions, HANDOFF: help."
handoffs: [
handoff_to(GreeterAgent) {
tool_name: "greet"
description: "Handle greetings"
},
handoff_to(HelperAgent) {
tool_name: "help"
description: "Handle questions"
}
]
}
agent GreeterAgent {
provider: "ollama"
model: "llama3.2:3b"
temperature: 0.5
system: "You are a friendly greeter. Greet the user warmly."
}
agent HelperAgent {
provider: "ollama"
model: "llama3.2:3b"
temperature: 0.5
system: "You are a helpful assistant. Answer questions clearly."
}
runner TracedRunner {
entry_agent: RouterAgent
max_turns: 5
tracing: enabled
}
{
let result = TracedRunner.run("Hello there!");
emit "=== Result ===";
emit "Final agent: " + result["final_agent"];
emit "Total turns: " + str(result["total_turns"]);
emit "Duration: " + str(result["total_duration_ms"]) + "ms";
emit "";
emit "=== Trace Summary ===";
emit result["trace_summary"];
emit "";
emit "=== Detailed Trace ===";
let trace = result["trace"];
for (entry in trace) {
emit "Turn " + str(entry["turn"]) + ":";
emit " Agent: " + entry["agent_name"];
emit " Action: " + entry["action"];
emit " Duration: " + str(entry["duration_ms"]) + "ms";
if (entry["was_handoff"]) {
emit " Handoff to: " + entry["handoff_to"];
}
}
}
Orchestration Patterns #
Beyond simple triage routing, there are several proven patterns for multi-agent orchestration. Let us examine each one.
Pattern 1: Triage Routing #
The most common pattern. A classifier agent routes to specialists:
agent Triage {
provider: "ollama"
model: "llama3.2:3b"
temperature: 0.1
system: "Classify the query. Reply with exactly one: MATH, CODE, or GENERAL."
}
agent MathExpert {
provider: "ollama"
model: "llama3.2:3b"
system: "You are a math expert. Solve problems step by step."
}
agent CodeExpert {
provider: "ollama"
model: "llama3.2:3b"
system: "You are a coding expert. Provide code solutions."
}
fun route_query(query) {
let category = Triage.ask(query);
if (category.contains("MATH")) {
return MathExpert.ask(query);
}
if (category.contains("CODE")) {
return CodeExpert.ask(query);
}
return "General: " + query;
}
{
emit route_query("What is 15 * 23?");
emit route_query("Write hello world in Python");
}
Pattern 2: Sequential Pipeline #
Output from one agent feeds into the next:
agent Researcher {
provider: "openai"
model: "gpt-4o-mini"
system: "You are a researcher. Provide factual information and key points."
}
agent Writer {
provider: "openai"
model: "gpt-4o-mini"
system: "You are a writer. Take notes and create polished prose."
}
agent Editor {
provider: "openai"
model: "gpt-4o-mini"
system: "You are an editor. Improve text for clarity. Output only improved text."
}
{
let topic = "artificial intelligence";
let research = Researcher.ask("Key facts about " + topic);
let draft = Writer.ask("Write a paragraph based on: " + research);
let final_text = Editor.ask("Edit and improve: " + draft);
emit final_text;
}
Pattern 3: Supervisor/Worker #
A supervisor evaluates and validates worker output:
agent Worker {
provider: "openai"
model: "gpt-4o-mini"
system: "Complete assigned tasks thoroughly but concisely."
}
agent Supervisor {
provider: "openai"
model: "gpt-4o-mini"
system: "Evaluate work quality. Reply APPROVED or NEEDS_REVISION: <reason>."
}
fun supervised_task(task) {
let result = Worker.ask(task);
let review = Supervisor.ask("Evaluate this response to '" + task + "': " + result);
if (review.contains("NEEDS_REVISION")) {
// Worker revises based on feedback
let revised = Worker.ask(task + "\n\nPrevious feedback: " + review);
return revised;
}
return result;
}
{
let output = supervised_task("List 5 benefits of exercise with explanations");
emit output;
}
Pattern 4: Debate/Adversarial #
Multiple agents present different perspectives, then a judge synthesizes:
agent Advocate {
provider: "openai"
model: "gpt-4o-mini"
system: "Argue IN FAVOR of the topic. Present 2-3 strong points."
}
agent Critic {
provider: "openai"
model: "gpt-4o-mini"
system: "Argue AGAINST the topic. Present 2-3 counterpoints."
}
agent Judge {
provider: "openai"
model: "gpt-4o-mini"
system: "You are an impartial judge. Given pro and con arguments,
provide a balanced conclusion in 2 sentences."
}
{
let topic = "Remote work should be the default for knowledge workers";
let pro = Advocate.ask(topic);
let con = Critic.ask(topic);
let verdict = Judge.ask("PRO: " + pro + " --- CON: " + con);
emit "=== Debate: " + topic + " ===";
emit "";
emit "FOR: " + pro;
emit "";
emit "AGAINST: " + con;
emit "";
emit "VERDICT: " + verdict;
}
Pattern 5: Pipeline Chaining with Runners #
For complex multi-step workflows, combine handoffs with runners:
agent Intake {
provider: "ollama"
model: "llama3.2:3b"
temperature: 0.3
system: "You receive customer requests. Extract the key issue and hand off.
For complaints: HANDOFF: transfer_to_ComplaintHandler
For questions: HANDOFF: transfer_to_QuestionHandler"
handoffs: [ComplaintHandler, QuestionHandler]
}
agent ComplaintHandler {
provider: "ollama"
model: "llama3.2:3b"
temperature: 0.5
system: "You handle complaints. Be empathetic. Offer a resolution."
}
agent QuestionHandler {
provider: "ollama"
model: "llama3.2:3b"
temperature: 0.5
system: "You answer customer questions clearly and helpfully."
}
runner SupportPipeline {
entry_agent: Intake
max_turns: 5
tracing: enabled
}
{
// Process multiple customer requests through the same pipeline
let requests = [
"I'm furious! My order arrived broken!",
"What are your return policies?",
"This is unacceptable service!"
];
for (i, request in enumerate(requests)) {
emit "--- Request " + str(i + 1) + " ---";
emit "Input: " + request;
let result = SupportPipeline.run(request);
emit "Handled by: " + result["final_agent"];
emit "Response: " + result["final_output"];
emit "";
}
}
Pattern 6: Planning Agent #
A planning agent decomposes a complex goal into steps, then coordinates other agents to execute each step:
agent Planner {
provider: "openai"
model: "gpt-4o"
temperature: 0.3
system: "You decompose goals into numbered steps. Output ONLY a numbered list.
Each step should be actionable by a single specialist agent."
}
agent Executor {
provider: "openai"
model: "gpt-4o-mini"
temperature: 0.5
system: "Execute the given task step. Be thorough but concise."
}
agent Monitor {
provider: "openai"
model: "gpt-4o-mini"
temperature: 0.1
system: "Given a plan and completed steps, respond with one of:
ON_TRACK, NEEDS_ADJUSTMENT: <reason>, or COMPLETE."
}
fun execute_plan(goal) {
// Step 1: Decompose the goal
let plan = Planner.ask("Decompose this goal: " + goal);
emit "Plan: " + plan;
// Step 2: Execute each step
let steps = split(plan, "\n");
let results = [];
for (step in steps) {
if (len(step) > 2) {
let result = Executor.ask("Execute: " + step);
push(results, {"step": step, "result": result});
emit "Completed: " + step;
}
}
// Step 3: Monitor overall progress
let status = Monitor.ask("Plan: " + plan + "\nResults: " + str(results));
emit "Status: " + status;
return {"plan": plan, "results": results, "status": status};
}
{
let output = execute_plan("Research and write a summary about quantum computing");
emit "Final status: " + output.status;
}
This pattern separates the concerns of planning, execution, and monitoring into different agents, each with the appropriate temperature and model for its role.
Pattern 7: Deep Search Agent #
A deep search agent breaks complex research questions into sub-queries, searches for each independently, and synthesizes the findings into a comprehensive answer. This is ideal for questions that span multiple domains or require aggregating diverse sources:
agent DeepSearcher {
provider: "openai"
model: "gpt-4o"
temperature: 0.2
system: "You are a deep research agent. Break complex questions into sub-queries, search for each, and synthesize findings into a comprehensive answer."
skills: [WebSearch]
}
{
let answer = DeepSearcher.ask("What are the latest advances in quantum error correction?");
emit answer;
}
The low temperature (0.2) keeps the search strategy focused and methodical. The
WebSearch skill gives the agent access to real-time information, making it suitable
for questions about current events, recent research, or rapidly evolving topics.
Pattern 8: Chain-of-Thought Agent #
A chain-of-thought agent is instructed to show its reasoning process before arriving at a conclusion. This makes the agent's logic transparent and auditable, which is especially valuable for math, logic, and analytical tasks:
agent Reasoner {
provider: "openai"
model: "gpt-4o"
temperature: 0.1
system: "Think step by step. For every question, show your reasoning process before giving a final answer. Format: REASONING: <steps>\nANSWER: <conclusion>"
}
{
let response = Reasoner.ask("If a train travels 120km in 1.5 hours, and then 80km in 1 hour, what is its average speed for the entire journey?");
emit response;
}
The very low temperature (0.1) minimizes creative drift and keeps the reasoning precise. The structured output format (REASONING / ANSWER) makes it easy to parse the reasoning chain programmatically if needed.
Modify the Reasoner agent to solve a multi-step logic puzzle,
such as: "There are three boxes labeled A, B, and C. Box A contains apples. Box B
contains bananas. Box C contains cherries. You swap A and B, then swap B and C. What
does each box contain?" Compare the result with and without the chain-of-thought
system prompt. Notice how explicit reasoning reduces errors.
Pattern 9: ReAct Agent (Reason + Act) #
The ReAct pattern combines reasoning and acting in an interleaved loop. The agent thinks about what to do (THOUGHT), takes an action using a tool (ACTION), observes the result (OBSERVATION), and repeats until it reaches a final answer:
agent ReActAgent {
provider: "openai"
model: "gpt-4o"
system: "Follow the ReAct pattern: THOUGHT (reason about what to do), ACTION (call a tool), OBSERVATION (analyze results), repeat until you have a final answer."
skills: [WebSearch, Calculator]
}
{
let result = ReActAgent.ask("What is the population of France divided by the area of France in square kilometers?");
emit result;
}
ReAct agents are particularly effective when a task requires both factual lookup (via tools) and mathematical or logical reasoning. The agent naturally alternates between gathering information and processing it, producing more accurate results than either a pure reasoning agent or a pure tool-using agent.
Pattern 10: Self-Reflection Agent #
Self-reflection pairs a Writer agent with a Critic agent in an iterative improvement loop. The Writer produces a draft, the Critic evaluates it and provides feedback, and the Writer revises based on that feedback. This continues for a fixed number of rounds or until the Critic approves:
agent Writer {
provider: "openai"
model: "gpt-4o-mini"
system: "Write content as requested."
}
agent Critic {
provider: "openai"
model: "gpt-4o"
system: "Critique the given text. Identify weaknesses and suggest specific improvements."
}
fun reflect_and_improve(prompt, max_rounds) {
let draft = Writer.ask(prompt);
let round = 0;
while (round < max_rounds) {
let critique = Critic.ask(f"Critique this: {draft}");
if (critique.contains("APPROVED")) { return draft; }
draft = Writer.ask(f"{prompt}\n\nPrevious draft: {draft}\nFeedback: {critique}");
round = round + 1;
}
return draft;
}
{
let final_essay = reflect_and_improve("Write a 200-word essay on climate change", 3);
emit final_essay;
}
Notice that the Writer uses a cheaper model (gpt-4o-mini) while the Critic uses a
more capable model (gpt-4o). This is intentional: writing is a generative task that
benefits from iteration, while critique requires stronger analytical reasoning. The
cost-per-round stays low because only the Critic needs the expensive model.
Pattern 11: Socratic Agent #
A Socratic agent never gives direct answers. Instead, it asks probing questions that guide the student toward discovering the answer themselves. This is powerful for educational applications and tutoring systems:
agent Socratic {
provider: "openai"
model: "gpt-4o"
system: "You are a Socratic tutor. Never give direct answers. Instead, ask probing questions that guide the student to discover the answer themselves. Ask at most 2 questions per response."
}
{
let q1 = Socratic.ask("What causes the seasons on Earth?");
emit "Tutor: " + q1;
let q2 = Socratic.ask("I think it's because the Earth is closer to the sun in summer?");
emit "Tutor: " + q2;
let q3 = Socratic.ask("Hmm, so maybe it's about the tilt of the Earth's axis?");
emit "Tutor: " + q3;
}
The Socratic agent is a single-agent pattern, but it pairs well with multi-agent architectures. For example, you could route students to different Socratic tutors depending on the subject (math, history, science), each with domain-specific probing strategies in their system prompts.
Pattern 12: Red Team / Blue Team #
The Red Team / Blue Team pattern uses adversarial collaboration for security analysis, risk assessment, and robustness testing. The Red Team agent attacks -- finding vulnerabilities, edge cases, and potential failures. The Blue Team agent defends -- proposing mitigations and countermeasures for each finding:
agent RedTeam {
provider: "openai"
model: "gpt-4o"
system: "You are a red team agent. Find vulnerabilities, edge cases, and potential failures in the given system/plan."
}
agent BlueTeam {
provider: "openai"
model: "gpt-4o"
system: "You are a blue team agent. Given red team findings, propose mitigations and defensive measures."
}
{
let system_desc = "A web API that accepts user-uploaded images, resizes them, and stores them in S3.";
let vulnerabilities = RedTeam.ask("Analyze this system for vulnerabilities: " + system_desc);
emit "=== Red Team Findings ===";
emit vulnerabilities;
emit "";
let defenses = BlueTeam.ask("Propose mitigations for these findings:\n" + vulnerabilities);
emit "=== Blue Team Mitigations ===";
emit defenses;
}
This pattern is similar to Debate/Adversarial (Pattern 4) but is specifically designed for security and robustness contexts. You can extend it with multiple rounds -- the Red Team tries to bypass the Blue Team's mitigations, and the Blue Team strengthens its defenses iteratively.
Pattern 13: Memory-Enhanced Agent #
A memory-enhanced agent retains information across conversations using persistent
storage. The memory field connects the agent to a conversation store, allowing it to
recall previous interactions and build context over time:
agent MemoryBot {
provider: "openai"
model: "gpt-4o-mini"
system: "You are a helpful assistant with persistent memory."
memory: "conversation_store"
}
{
// First interaction
let r1 = MemoryBot.ask("My name is Alice and I'm working on a Neam project.");
emit r1;
// Later interaction -- the agent remembers
let r2 = MemoryBot.ask("What project am I working on?");
emit r2;
}
The memory field specifies the name of a persistent store. Across separate runs of
the program, the agent can recall facts from earlier conversations. This is essential
for personal assistants, long-running tutoring sessions, and any application where
continuity matters.
Combine the Memory-Enhanced Agent with the Triage Routing
pattern. Create a system where a triage agent routes users to specialists, but each
specialist has its own memory store (e.g., "billing_memory", "tech_memory"). This
way, the billing specialist remembers past billing conversations and the tech
specialist remembers past technical issues -- even across separate sessions.
Pattern 14: Expert Retrieval (Multi-Knowledge) #
Expert retrieval connects specialized agents to domain-specific knowledge bases using
Neam's knowledge system. Each agent retrieves from its own curated set of documents,
producing answers grounded in authoritative sources:
knowledge LegalDocs { vector_store: "usearch", sources: [{ type: "file", path: "./legal/*.txt" }] }
knowledge TechDocs { vector_store: "usearch", sources: [{ type: "file", path: "./tech/*.md" }] }
agent LegalExpert {
provider: "openai"
model: "gpt-4o"
system: "You are a legal expert."
connected_knowledge: [LegalDocs]
}
agent TechExpert {
provider: "openai"
model: "gpt-4o"
system: "You are a technical expert."
connected_knowledge: [TechDocs]
}
agent Dispatcher {
provider: "openai"
model: "gpt-4o-mini"
temperature: 0.1
system: "Classify the user's question as LEGAL or TECHNICAL. Respond with exactly one word."
}
fun expert_answer(question) {
let category = Dispatcher.ask(question);
if (category.contains("LEGAL")) {
return LegalExpert.ask(question);
}
return TechExpert.ask(question);
}
{
emit expert_answer("What are the GDPR requirements for data retention?");
emit expert_answer("How do I configure a reverse proxy with nginx?");
}
The connected_knowledge field gives each agent access to a vector store built from
domain-specific files. When the agent receives a question, it retrieves relevant chunks
from its knowledge base and includes them in the LLM context. This is retrieval-augmented
generation (RAG) at the agent level -- each expert only sees documents from its own domain,
which reduces noise and hallucination.
Pattern Selection Guide #
With 14 patterns available, choosing the right one can feel overwhelming. Use this table as a quick reference:
| Pattern | Best For | Complexity | Example Use Case |
|---|---|---|---|
| 1. Triage Routing | Classification + dispatch | Low | Customer service routing |
| 2. Sequential Pipeline | Content creation workflow | Low | Research, Write, Edit |
| 3. Supervisor/Worker | Quality assurance | Medium | Code review, content approval |
| 4. Debate/Adversarial | Balanced analysis | Medium | Policy evaluation |
| 5. Pipeline Chaining | Multi-step with handoffs | Medium | Support ticket processing |
| 6. Planning Agent | Complex multi-step tasks | High | Project management |
| 7. Deep Search | Research questions | Medium | Market analysis, literature review |
| 8. Chain-of-Thought | Reasoning tasks | Low | Math, logic puzzles |
| 9. ReAct | Tool-using reasoning | Medium | Data analysis with live search |
| 10. Self-Reflection | Iterative improvement | Medium | Writing, code generation |
| 11. Socratic | Education / tutoring | Low | Student learning |
| 12. Red/Blue Team | Security analysis | Medium | Threat modeling |
| 13. Memory-Enhanced | Ongoing conversations | Medium | Personal assistant |
| 14. Expert Retrieval | Domain Q&A with sources | Medium | Legal/medical consultation |
How to read this table:
- Low complexity patterns require 1--2 agents and minimal control flow. Start here.
- Medium complexity patterns require 2--4 agents with loops or conditional logic.
- High complexity patterns require 3+ agents with planning, monitoring, and dynamic dispatch.
When in doubt, start with Pattern 1 (Triage Routing) and evolve toward more sophisticated patterns only when the simpler approach falls short.
Manual Routing vs. Runner-Based Handoffs #
You have seen two approaches to multi-agent coordination:
-
Manual routing (Pattern 1-4): Your code explicitly decides which agent to call based on classifier output. You control the flow with
ifstatements. -
Runner-based handoffs (Pattern 5): The runner manages the flow. The LLM decides when to hand off based on the conversation and available handoff tools.
| Aspect | Manual Routing | Runner-Based Handoffs |
|---|---|---|
| Control | You decide the routing logic | LLM decides when to hand off |
| Flexibility | Full programmatic control | More adaptive to novel inputs |
| Predictability | Highly predictable | LLM may route unexpectedly |
| Complexity | More code to write | Less code, more configuration |
| Debugging | Easy to trace (explicit flow) | Use tracing for visibility |
| Best for | Deterministic workflows | Conversational, open-ended flows |
In practice, production systems often combine both approaches -- using runners for the outer orchestration loop and manual routing for specific decision points within agents.
Designing Multi-Agent Systems #
When designing a multi-agent system, follow these guidelines:
-
Start with a single agent. Only split into multiple agents when a single agent's system prompt becomes unwieldy or performance degrades.
-
Give each agent one clear responsibility. The system prompt should be describable in one sentence.
-
Use the cheapest model that works. Triage agents that just classify can use small, fast models. Only use expensive models for tasks that require strong reasoning.
-
Set conservative max_turns. For simple routing,
max_turns: 3is enough. Only increase for complex multi-hop workflows. -
Always enable tracing during development. Turn it off in production if the performance overhead is a concern.
-
Test each agent independently before composing them into a multi-agent system.
The spawn Keyword #
The spawn keyword lets you invoke another agent by name from within your code. Unlike
handoffs, which transfer control through a runner, spawn performs a direct, single-turn
LLM call to the named agent and returns the result:
// Keyword form — static agent name
let result = spawn researcher("Find recent papers on RAG");
// Native function form — dynamic agent name
let agent_name = "researcher";
let result = spawn(agent_name, "Find recent papers on RAG");
When you call spawn, the runtime resolves the agent name by checking claw agents first,
then forge agents, then stateless agents. If no agent is found, it returns an error
string.
spawn performs a simplified single-turn LLM call. It does NOT invoke the
full .ask() session flow for claw agents or the .run() loop for forge agents. If
you need session-aware conversation or full forge iteration, call agent.ask() or
agent.run() directly.
Spawn with Orchestrable Callbacks #
When the spawning agent implements the Orchestrable trait (Chapter 27), callbacks fire
automatically before and after the sub-agent call:
This is useful for logging, metrics collection, and access control on sub-agent invocations.
Spawn Example #
agent researcher {
provider: "openai"
model: "gpt-4o"
system: "You are a research assistant. Provide concise summaries of topics."
}
agent writer {
provider: "openai"
model: "gpt-4o"
system: "You are a technical writer. Create clear, structured reports."
}
{
// Gather research
let findings = spawn researcher("Summarize recent trends in AI agent frameworks");
// Write report based on research
let report = spawn writer("Write a 200-word summary based on: " + findings);
emit report;
}
DAG Execution with dag_execute() #
For workflows where multiple agents need to execute in a specific dependency order,
Neam provides the dag_execute() native function. It takes a list of DAG (Directed
Acyclic Graph) nodes, topologically sorts them, and executes each agent in order:
let results = dag_execute([
{
"id": "research",
"agent": "researcher",
"task": "Gather data on market trends",
"depends_on": []
},
{
"id": "analysis",
"agent": "analyst",
"task": "Analyze the research findings",
"depends_on": ["research"]
},
{
"id": "report",
"agent": "writer",
"task": "Write executive summary",
"depends_on": ["research", "analysis"]
}
]);
// results is a Map: { "research": "...", "analysis": "...", "report": "..." }
emit results["report"];
DAG Node Fields #
| Field | Type | Required | Description |
|---|---|---|---|
id |
String | Yes | Unique node identifier |
agent |
String | Yes | Name of the agent to invoke via spawn |
task |
String | No | Task description passed to the agent |
depends_on |
List of strings | No | IDs of nodes that must complete first |
How DAG Execution Works #
The runtime uses Kahn's topological sort to determine the correct execution order.
Nodes with no unresolved dependencies are eligible to execute. Each node invokes
the named agent via spawn:
DAG Patterns #
Two common DAG patterns emerge in practice:
Fan-Out / Fan-In: Multiple independent agents execute in parallel, then a synthesis agent combines their results:
let results = dag_execute([
{ "id": "finance", "agent": "finance_expert", "task": query, "depends_on": [] },
{ "id": "legal", "agent": "legal_expert", "task": query, "depends_on": [] },
{ "id": "tech", "agent": "tech_expert", "task": query, "depends_on": [] },
{
"id": "synthesis",
"agent": "synthesizer",
"task": "Combine expert opinions into a unified recommendation",
"depends_on": ["finance", "legal", "tech"]
}
]);
Sequential Chain: Each agent depends on the previous, forming a linear pipeline:
let results = dag_execute([
{ "id": "step1", "agent": "extractor", "task": "Extract key data", "depends_on": [] },
{ "id": "step2", "agent": "transformer", "task": "Transform to schema", "depends_on": ["step1"] },
{ "id": "step3", "agent": "validator", "task": "Validate output format", "depends_on": ["step2"] }
]);
Build a DAG with four agents: a researcher that gathers data,
a fact_checker that verifies the research (depends on researcher), a writer that
drafts a report (depends on researcher), and an editor that polishes the final
output (depends on both fact_checker and writer). Run it with dag_execute() and
emit the editor's output.
Summary #
In this chapter, you learned:
- Multi-agent orchestration enables specialization, cost optimization, and maintainability through agent separation of concerns.
- Handoffs transfer control between agents using the
handoffsfield and are implemented as tools that the LLM can invoke. - Advanced handoff configuration supports custom tool names, descriptions, input filters, callbacks, conditional enablement, and structured input types.
- Context sharing between agents is automatic through the runner, with input filters for transforming, summarizing, and securing context during handoffs.
- Runners manage the multi-agent execution loop with configurable max turns, tracing, and guardrails.
- Runner
.run()returns a result map withfinal_agent,final_output,total_turns, and trace data. - Fourteen orchestration patterns: triage routing, sequential pipelines, supervisor/worker, debate/adversarial, pipeline chaining, planning agents, deep search, chain-of-thought, ReAct (reason + act), self-reflection, Socratic tutoring, red team / blue team, memory-enhanced agents, and expert retrieval with multi-knowledge bases.
- Manual routing gives you explicit control; runner-based handoffs let the LLM decide.
- The Pattern Selection Guide helps you choose the right pattern based on your use case, ranging from low-complexity single-dispatch patterns to high-complexity planning architectures.
- The
spawnkeyword invokes another agent by name for a single-turn LLM call, with automatic callback hooks when the spawning agent implementsOrchestrable. - The
dag_execute()function executes a directed acyclic graph of agent tasks with dependency ordering using Kahn's topological sort. - Two common DAG patterns are fan-out/fan-in (parallel experts with synthesis) and sequential chain (linear pipeline).
In the next chapter, we will add safety layers to these multi-agent systems with guardrails.
Exercises #
Exercise 13.1: Three-Agent Router
Create a triage system with three specialist agents: MathExpert, HistoryExpert,
and ScienceExpert. Write a triage agent that classifies questions into one of the
three categories and routes accordingly. Test with at least one question per category.
Exercise 13.2: Review Pipeline
Build a three-stage pipeline: Drafter writes an initial response, Reviewer provides
critique, and Reviser incorporates the feedback. Run it for the prompt "Explain the
water cycle" and emit the output at each stage.
Exercise 13.3: Runner with Tracing Create a runner-based system with a triage agent and two specialists. Enable tracing. After running a query, iterate through the trace entries and emit a formatted log showing which agent ran at each turn and whether a handoff occurred.
Exercise 13.4: Conditional Handoffs Implement an agent with three handoff targets. Make one handoff always enabled, one conditionally enabled based on a function, and one always disabled. Test by observing which agents the triage can and cannot hand off to.
Exercise 13.5: Debate System
Build a debate system with Proponent, Opponent, and Moderator agents. The
Moderator should receive arguments from both sides and produce a balanced summary. Run
a debate on the topic "AI should be regulated by governments" and emit all three outputs.
Exercise 13.6: Multi-Provider Multi-Agent Create a multi-agent system where the triage agent uses Ollama (free, fast classification), the billing specialist uses GPT-4o-mini (cost-effective), and the technical specialist uses GPT-4o (maximum capability). Include comments explaining your provider choices.
Exercise 13.7: Pipeline with Error Handling
Build a runner-based pipeline where one of the specialist agents might fail (simulate by
using a nonexistent model). Wrap the Runner.run() call in try/catch and emit
appropriate error messages. Also check the completed field in the result map.
Exercise 13.8: Spawn Chain
Create three agents (translator, summarizer, formatter) using the spawn keyword.
The main program should spawn the translator with a paragraph of text, spawn the
summarizer with the translation result, and spawn the formatter with the summary. Emit
the result at each stage.
Exercise 13.9: DAG Orchestration
Build a four-agent DAG using dag_execute(): a data_collector (no dependencies), a
validator (depends on data_collector), a analyzer (depends on data_collector), and a
report_writer (depends on both validator and analyzer). Run the DAG and emit each
agent's result. Verify that the execution order respects the dependency graph.
Exercise 13.10: Self-Reflection Loop
Implement the Self-Reflection pattern (Pattern 10): a Writer agent produces a first
draft on any topic of your choice, a Critic agent evaluates it and provides specific
feedback, and the Writer revises based on the feedback. Limit the loop to 3 rounds
maximum. Use emit to output the draft at each stage so you can observe how the text
improves with each iteration. Bonus: add a round counter to each emitted draft (e.g.,
"=== Round 1 Draft ===" / "=== Round 2 Draft ===" / "=== Final Draft ===").