Chapter 22 — Three Coordination Modes: Centralized, Swarm, Evolutionary #

"Order is not pressure which is imposed on society from without, but an equilibrium which is set up from within." -- Jose Ortega y Gasset


📖 25 min read | 👤 All personas | 🏷️ Part VI: Orchestration

What you'll learn:


The Problem: One Size Does Not Fit All #

Marcus, the data scientist, has two very different projects on his desk.

Project A is a regulatory churn model for the banking division. Every step must be auditable. Every decision must be traceable to a requirement. The regulator will ask "why did you choose this feature?" and Marcus needs a chain of evidence from business requirement to feature selection to model coefficient.

Project B is an exploratory analysis of a new market segment. Nobody knows what the right questions are yet. The data is messy, the hypothesis is vague, and the goal is to find interesting patterns as fast as possible. Auditability is nice but speed is essential.

Project A needs centralized, deterministic coordination. Project B needs something more fluid. The Neam agent stack supports both -- and a third mode for when you want the system to optimize its own coordination topology.


Mode 1: Centralized RACI #

Centralized RACI is the default coordination mode. The DIO acts as a central dispatcher, assigning tasks to specialist agents according to the RACI matrix.

Architecture Diagram

Characteristics #

PropertyValue
DeterminismHigh -- same inputs produce same execution order
AuditabilityComplete -- every dispatch recorded with RACI
BottleneckDIO is a single point of coordination
ParallelismLimited -- sequential phase gates
Best forRegulated workflows, compliance-critical projects

How It Works #

  1. DIO receives the task specification
  2. DIO decomposes the task into phases (requirements, engineering, modeling, testing, deployment, monitoring)
  3. For each phase, DIO selects the R agent and dispatches the task
  4. R agent executes, consulting C agents as needed
  5. DIO validates the output against quality gates
  6. If passed, DIO advances to the next phase
  7. If failed, DIO retries or escalates

Strengths #

Weaknesses #

💡 When to use centralized RACI: Any project where auditability matters more than speed. Regulatory models, production deployments, anything that a compliance team will audit.


Mode 2: Swarm Stigmergy #

Swarm mode draws inspiration from biological swarm intelligence. Instead of a central dispatcher, agents coordinate through stigmergy -- indirect coordination via shared artifacts in the environment.

DIAGRAM Swarm Stigmergy Mode
flowchart TB
  subgraph SWARM["SWARM STIGMERGY MODE"]
    BA["BA"] --> SPACE
    DS["DS"] --> SPACE
    Test["Test"] --> SPACE
    subgraph SPACE["SHARED ARTIFACT SPACE"]
      direction LR
      BRD["BRD"]
      Features["Features"]
      Model["Model"]
      Tests["Tests"]
      Deploy["Deploy"]
    end
    SPACE --> Causal["Causal"]
    SPACE --> MLOps["MLOps"]
    SPACE --> DIO["DIO (watch)"]
  end
  style SPACE fill:#f9f9f9,stroke:#333

Agents deposit artifacts, consume others' artifacts, and react to changes

How Stigmergy Works #

In biological swarms, ants deposit pheromones that other ants follow. In the Neam swarm mode, agents deposit artifacts (documents, models, test results) into a shared space. Other agents consume these artifacts and produce new ones.

The key insight: no agent tells another agent what to do. Agents react to the state of the shared environment.

Agent Behavior Loop (each agent independently)
  1. SENSE: Check shared space for new/changed artifacts
  2. DECIDE: Can I contribute based on my specialty?
  3. ACT: Produce new artifact, deposit in shared space
  4. SIGNAL: Artifact publication notifies interested agents
  5. REPEAT: Continue until task converges

Convergence Detection #

The swarm converges when no agent has pending work:

IterationActive AgentsNew ArtifactsStatus
133Exploring
542Building
1031Refining
1521Converging
2010Validating
2300CONVERGED

Deadlock Prevention #

Swarms can deadlock when agents wait for artifacts that no agent will produce. The Neam swarm mode includes three deadlock prevention mechanisms:

  1. Timeout watchdog: If no new artifact appears within a configurable window, the DIO (in observer mode) injects a stimulus
  2. Dependency analysis: Before launching the swarm, the DIO verifies that every required artifact type has at least one capable producer
  3. Recovery injection: If deadlock is detected, the DIO can temporarily take over as a centralized dispatcher for the stuck phase

DataSims Evidence: Swarm Performance #

From the DataSims evaluation (evaluation/results/swarm_mode.json):

MetricCentralizedSwarmDelta
Convergence7 phases (serial)23 iterationsDifferent measurement
Deadlock rate0% (by design)2%Expected in decentralized
Recovery rateN/A98%Near-complete self-healing
AUC-ROC0.8470.847Equivalent quality
CES0.9250.925Equivalent effectiveness
Quality GatepassedpassedNo degradation

Key findings:

⚠️ The 2% deadlock rate is a design tradeoff, not a defect. Centralized RACI has 0% deadlock because the DIO prevents it by construction. Swarm mode accepts a small deadlock probability in exchange for eliminating the central bottleneck and enabling parallel execution.

Swarm vs. Centralized Tradeoff
DimensionCentralized RACISwarm Stigmergy
Deadlock RiskLow (0%)Higher (2%)
ParallelismLow (sequential)High (concurrent)
AuditabilityHigh (full RACI)Moderate (artifact-based)

💡 When to use swarm mode: Exploratory analysis, research projects, situations where you want agents to discover emergent patterns rather than follow a predetermined plan.


Mode 3: Evolutionary Optimization #

Evolutionary mode uses a genetic algorithm to optimize the agent coordination topology itself. Instead of using a fixed coordination strategy (centralized or swarm), the system evolves the best topology for the specific task.

DIAGRAM Evolutionary GA Mode
flowchart TB
  subgraph GEN1["Generation 1: Random Topologies"]
    direction LR
    T1["T1\n0.45"]
    T2["T2\n0.62"]
    T3["T3\n0.51"]
    T4["T4\n0.73"]
    T5["T5\n0.38"]
  end
  GEN1 --> SEL["Selection: Top 2 by fitness"]
  SEL --> T4S["T4 (0.73)"]
  SEL --> T2S["T2 (0.62)"]
  T4S --> CROSS["Crossover"]
  T2S --> CROSS
  CROSS --> GEN2
  subgraph GEN2["Generation 2: Evolved Topologies"]
    direction LR
    T4E["T4\n0.73"]
    T4P["T4'\n0.78"]
    T2P["T2'\n0.69"]
    T6["T6\n0.71"]
    T7["T7\n0.55"]
  end
  GEN2 --> REPEAT["... repeat for N generations ..."]
  REPEAT --> FINAL
  subgraph FINAL["Generation 67: Converged"]
    BEST["Best Topo\nFitness = 0.91"]
  end

Genome Representation #

Each "topology" is a genome that encodes:

CODE
  Chromosome = [
    agent_order:        [BA, DS, Causal, Test, MLOps]  // execution sequence
    parallelism_flags:  [0, 1, 1, 0, 0]                // which phases run in parallel
    consultation_edges: [(DS, Causal), (BA, Test)]      // C relationships
    gate_thresholds:    [0.9, 0.85, 0.95, 0.90]        // quality gate strictness
    retry_limits:       [3, 2, 3, 1, 2]                 // per-agent retry budgets
  ]

Fitness Function #

The fitness function evaluates each topology on a composite score:

CODE
  Fitness(topology) =
      0.25 * quality_score          // model AUC, F1, etc.
    + 0.20 * speed_score            // time to completion
    + 0.15 * reliability_score      // error detection, recovery
    + 0.15 * traceability_score     // RACI completeness
    + 0.10 * documentation_score    // BRD, specs generated
    + 0.10 * cost_efficiency_score  // LLM token cost
    + 0.05 * adaptability_score     // response to quality issues

This is the same 7-dimension proficiency scoring used in the DataSims evaluation framework.

Mutation Operators #

Three mutation operators introduce variation:

  1. Swap mutation: Exchange two agents' positions in the execution order
  2. Gate mutation: Adjust a quality gate threshold by +/- 10%
  3. Edge mutation: Add or remove a consultation edge between two agents

DataSims Evidence: Evolutionary Performance #

From the DataSims evaluation (evaluation/results/evolutionary_mode.json):

MetricCentralizedEvolutionaryDelta
Best fitness0.925 (CES)0.91-1.6%
ConvergenceN/AGeneration 67
AUC-ROC0.8470.847Equivalent
CES0.9250.925Equivalent
Quality GatepassedpassedNo degradation

Key findings:

DIAGRAM Evolutionary Convergence Curve
xychart-beta
  title "Evolutionary Convergence Curve"
  x-axis "Generation" [0, 10, 20, 30, 40, 50, 60, 67]
  y-axis "Fitness" 0.4 --> 1.0
  line [0.4, 0.62, 0.78, 0.85, 0.88, 0.9, 0.91, 0.91]

🎯 When to use evolutionary mode: When you are unsure of the optimal coordination strategy for a novel task type. The GA explores the topology space and converges on a good strategy. For well-understood tasks (like churn prediction), the default centralized RACI is already optimal.


Comparison: When to Use Which #

CriterionCentralizedSwarmEvolutionary
DeterminismHIGHLOWLOW
AuditabilityFULLPARTIALFULL (best)
SpeedMODERATEFASTSLOW (setup)
ParallelismLOWHIGHVARIES
Deadlock riskNONE2%NONE
Setup costLOWLOWHIGH (GA)
Novelty adapt.LOWMODERATEHIGH
Best forRegulated workflowsExploratory analysisNovel tasks or topology optimization

Decision Framework #

Use this decision tree to select the right mode:

DIAGRAM Coordination Mode Decision Tree
flowchart TD
  Q1["Is auditability required by regulation?"]
  Q1 -->|YES| RACI["Centralized RACI"]
  Q1 -->|NO| Q2["Is the task type well-understood?"]
  Q2 -->|YES| RACI2["Centralized RACI\n(proven, lowest overhead)"]
  Q2 -->|NO| Q3["Do you need speed over optimality?"]
  Q3 -->|YES| SWARM["Swarm Stigmergy"]
  Q3 -->|NO| EVO["Evolutionary GA"]

💡 In practice, most production deployments use centralized RACI. Swarm and evolutionary modes are valuable for research, exploration, and topology optimization -- but when a model goes to production, the compliance team wants deterministic, auditable execution.


Hybrid Approaches #

The three modes are not mutually exclusive. Common hybrid patterns:

  1. Evolutionary discovery + Centralized execution: Use the GA to find the optimal topology offline, then deploy it as a centralized RACI configuration in production.
  1. Centralized with swarm phases: Use centralized RACI for the overall lifecycle, but allow swarm behavior within specific phases (e.g., feature engineering, where multiple data exploration agents can work in parallel).
  1. Swarm with centralized gates: Let agents coordinate via stigmergy, but require DIO-validated quality gates between major phases.

Key Takeaways #

For Further Exploration #