Chapter 6: The Data Intelligent Orchestrator (DIO) #

"The conductor does not make a sound. He depends, for his power, on his ability to make other people powerful." -- Benjamin Zander


30 min read | David, Priya, Dr. Chen | Part II: The Architecture

What you'll learn:


The Problem #

David is the VP of Data at a growing e-commerce company. He just asked his team a simple question: "Can we predict which customers will churn in the next 90 days and figure out why?"

What followed was a three-week coordination exercise. The business analyst spent four days writing requirements that the data engineer could not parse. The data engineer spent a week building a feature pipeline, only to discover the data scientist needed different features. The data scientist trained a model but had no idea which features contained PII. Nobody wrote tests. Nobody set up monitoring. When the model finally reached production, it served stale predictions because the feature pipeline ran daily but the model scored weekly.

The problem was not that David's team lacked talent. It was that nobody owned the end-to-end coordination. The project manager tracked Jira tickets but did not understand the data dependencies. The architect drew diagrams but did not enforce execution order. The handoffs between roles -- the "white space" on the org chart -- is where 85% of ML projects die.

The Data Intelligent Orchestrator exists to eliminate that white space.


DIO Internal Architecture #

The DIO is not a simple dispatcher that routes tasks to agents. It is a structured processing pipeline with 9 components, each responsible for a specific aspect of orchestration:

Architecture Diagram

Each of these components corresponds to a runtime object in the Neam VM. This is not a prompt-engineering trick — it is compiled infrastructure.


The 8 Auto-Patterns #

When the DIO classifies a task's intent, it matches against 8 pre-defined patterns. Each pattern encodes a proven sequence of agent activations, quality gates, and artifact flows:

Pattern 1: Churn Prediction

Intent: prediction + causation + production deployment
Gates: after ETL (data quality), after model (performance), before deploy

DIAGRAM Churn Prediction Pattern
flowchart LR
  BA["Data-BA"] -->|"BRD"| ETL["ETL Agent"]
  ETL -->|"features"| DS["DataScientist"]
  DS -->|"model"| CA["Causal Agent"]
  CA --> DT["DataTest"]
  DT -->|"PASS"| ML["MLOps"]
  ML --> PROD["PROD"]

Pattern 2: Anomaly Investigation

Intent: diagnostic + root cause analysis
Gates: after causal (evidence threshold)

DIAGRAM Anomaly Investigation Pattern
flowchart LR
  AN["Analyst"] -->|"data"| DS["DataScientist"]
  DS -->|"patterns"| CA["Causal Agent"]
  CA --> RPT["RCA Report"]

Pattern 3: Compliance Audit

Intent: governance + validation
Gates: after each check (mandatory pass)

DIAGRAM Compliance Audit Pattern
flowchart LR
  GOV["Governance"] -->|"classification"| DT["DataTest"]
  GOV -->|"policies"| BA["Data-BA"]
  DT -->|"results"| RPT["Report"]
  BA --> RPT

Pattern 4: Platform Migration

Intent: migration + validation + cutover
Gates: after migration (reconciliation), after cutover (health)

DIAGRAM Platform Migration Pattern
flowchart LR
  MOD["Modeling"] -->|"schema"| MIG["Migration"]
  MIG -->|"translated"| DA["Data Agent"]
  DA --> DT["DataTest"]
  DT --> DO["DataOps"]

Pattern 5: Feature Engineering

Intent: data transformation + feature creation
Gates: after ETL (quality gate)

DIAGRAM Feature Engineering Pattern
flowchart LR
  BA["Data-BA"] -->|"specs"| DA["Data Agent"]
  DA -->|"sources"| ETL["ETL Agent"]
  ETL -->|"features"| DT["DataTest"]

Pattern 6: Model Retraining

Intent: retraining + validation + swap
Gates: after model (challenger vs champion)

DIAGRAM Model Retraining Pattern
flowchart LR
  DS["DataScientist"] -->|"challenger"| DT["DataTest"]
  DT -->|"PASS"| ML["MLOps"]
  ML --> SWAP["Champion Swap"]

Pattern 7: Data Quality Assessment

Intent: quality + profiling + reporting
Gates: after profiling (threshold check)

DIAGRAM Data Quality Assessment Pattern
flowchart LR
  DO["DataOps"] -->|"metrics"| DT["DataTest"]
  DT -->|"results"| GOV["Governance"]
  GOV --> RPT["Report"]

Pattern 8: Ad-Hoc Analysis

Intent: exploration + insight
Gates: none (advisory, not blocking)

DIAGRAM Ad-Hoc Analysis Pattern
flowchart LR
  AN["Analyst"] -->|"query results"| CA["Causal Agent (optional)"]
  CA --> RPT["Report"]
Insight

- Patterns are not rigid templates. They are starting points that the DIO adapts based on context. If the churn prediction pattern detects that the feature table already exists (from a previous run), it skips the ETL Agent and starts directly with the DataScientist Agent. If the task mentions "no deployment needed," it drops the MLOps Agent from the crew. Patterns encode the common case; the DIO handles the exceptions.


Three Operating Modes #

The DIO supports three modes that control how much autonomy it exercises:

Auto Mode #

In Auto mode, the DIO receives a natural language task and handles everything: task understanding, crew formation, pattern selection, delegation, execution, error recovery, and result synthesis. No human approval is required at any stage.

DIAGRAM Auto Mode Flow
flowchart TD
  U1["User: Task Definition"] --> DIO["DIO: Task Understanding\n+ Crew Formation\n+ Pattern Selection\n+ RACI Delegation"]
  DIO --> EX["Execute All Agents"]
  EX --> SYN["Synthesize Results"]
  SYN --> U2["User: Final Report + Artifacts"]

Human involvement: Task definition only. Best for: Well-understood problems, trusted Agent.MD.

Config Mode #

In Config mode, the user defines the crew, the agent assignments, and the execution order. The DIO manages execution, error handling, and result synthesis, but it does not make strategic decisions about which agents to use or in what order.

DIAGRAM Config Mode Flow
flowchart TD
  U1["User: Declares Agents,\nAssigns Tasks, Defines Sequence"] --> DIO["DIO: Validates Config"]
  DIO --> EX["Execute as Specified"]
  EX --> ERR["Handle Errors"]
  ERR --> SYN["Synthesize Results"]
  SYN --> U2["User: Results per Specification"]

Human involvement: Full architectural control. Best for: Regulated environments, custom workflows.

Hybrid Mode #

Hybrid mode combines autonomous planning with human approval checkpoints. The DIO proposes a plan (crew, pattern, RACI), presents it to the user for approval, then executes. Quality gate results are presented for human review before proceeding.

DIAGRAM Hybrid Mode Flow
flowchart TD
  U1["User: Task Description"] --> DIO1["DIO: Plans Crew\n+ Pattern + RACI"]
  DIO1 --> U2["User: Reviews and\nApproves Plan"]
  U2 --> DIO2["DIO: Executes Plan"]
  DIO2 --> GATE["Quality Gate"]
  GATE --> U3["User: Reviews\nGate Result"]
  U3 --> DIO3["DIO: Continues\nor Adjusts"]

Human involvement: Approval at planning + gate points. Best for: High-stakes projects, learning the system.

Anti-Pattern

- Starting with Auto mode on your first project. Hybrid mode lets you build trust incrementally. You see the DIO's crew formation logic, verify the RACI assignments make sense, and approve quality gate results before they flow downstream. Once you have seen 5-10 successful Hybrid runs, switching to Auto mode is a confident decision, not a leap of faith.


Crew Formation Scoring #

When the DIO selects agents for a crew, it does not pick arbitrarily. It scores each candidate agent on four weighted dimensions:

Crew Formation Scoring Algorithm

Score(agent, task) = 0.40 * Capability + 0.20 * CostEfficiency + 0.20 * InfraCompatibility + 0.20 * HistoricalPerformance

  • Capability (40%) — How well do the agent's traits match the task requirements? Score: count(matching_traits) / count(required_traits)
  • CostEfficiency (20%) — How much budget does this agent consume relative to alternatives? Score: 1 - (estimated_cost / budget_remaining)
  • InfraCompatibility (20%) — Can this agent work with the declared infrastructure profile? Score: binary (1.0 if compatible, 0.0 if not)
  • HistoricalPerformance (20%) — How well has this agent performed on similar tasks? Score: weighted_avg(success_rate, gate_pass_rate, speed_score)

Scoring Example: Churn Prediction Crew #

Task: "Predict customer churn and identify retention drivers"
Required traits: DataProducer, DataConsumer, CausalReasoner, QualityGatekeeper

AgentCapabilityCostInfraHistoryTOTALInclude?
Data-BA0.850.701.000.900.86YES
ETL Agent0.900.801.000.850.89YES
DataScientist0.950.601.000.920.88YES
Causal1.000.501.000.880.87YES
DataTest0.900.751.000.950.90YES
MLOps0.850.651.000.800.83YES
Governance0.700.901.000.950.84YES
DataOps0.400.851.000.900.68no
Analyst0.300.901.000.850.62no
Modeling0.250.801.000.800.57no
Migration0.100.701.000.750.45no

Threshold: 0.75 — Crew formed: 7 agents

Insight

- The 40% weight on Capability is deliberate. It means the DIO prioritizes "can this agent do the job?" over "is this agent cheap?" or "has it done this before?" Cost and history matter, but they cannot override capability. A cheap agent that cannot do causal reasoning is worthless for a task that requires it.


Working Neam Example: DIO Declaration #

Here is a complete, working Neam program that declares a DIO and its managed agents for the SimShop churn prediction task:

NEAM
// ═══ BUDGETS ═══
budget DIOBudget { cost: 500.00, tokens: 2000000 }
budget AgentBudget { cost: 50.00, tokens: 500000 }

// ═══ INFRASTRUCTURE PROFILE ═══
infrastructure_profile SimShopInfra {
    data_warehouse: {
        platform: "postgres",
        connection: env("SIMSHOP_PG_URL"),
        schemas: [
            "simshop_oltp",
            "simshop_staging",
            "simshop_dw",
            "ml_features",
            "ml_predictions"
        ]
    },
    data_science: {
        mlflow: { uri: env("MLFLOW_TRACKING_URI") },
        compute: { local: true, gpu: false }
    },
    governance: {
        regulations: ["GDPR"],
        pii_columns: [
            "email", "phone", "date_of_birth",
            "first_name", "last_name"
        ]
    }
}

// ═══ SUB-AGENTS ═══

// Requirements analyst
databa agent ChurnBA {
    provider: "openai", model: "gpt-4o", temperature: 0.3,
    agent_md: "./agents/simshop_ba.agent.md",
    budget: AgentBudget
}

// SQL connection for ETL and Analyst
sql_connection SimShopDB {
    platform: "postgres",
    connection: env("SIMSHOP_PG_URL"),
    database: "simshop"
}

// Analyst for data exploration
analyst agent SimShopAnalyst {
    provider: "openai", model: "gpt-4o-mini",
    connections: [SimShopDB],
    budget: AgentBudget
}

// Data scientist for modeling
datascientist agent ChurnDS {
    provider: "openai", model: "gpt-4o",
    budget: AgentBudget
}

// Causal reasoning
causal agent ChurnCausal {
    provider: "openai", model: "o3-mini",
    budget: AgentBudget
}

// Quality validation
datatest agent ChurnTester {
    provider: "openai", model: "gpt-4o",
    budget: AgentBudget
}

// Production deployment and monitoring
mlops agent ChurnMLOps {
    provider: "openai", model: "gpt-4o",
    budget: AgentBudget
}

// ═══ THE DATA INTELLIGENT ORCHESTRATOR ═══
dio agent SimShopDIO {
    mode: "config",
    task: "Predict which SimShop customers will churn in the
          next 90 days, identify the top causal drivers, and
          build a production-ready prediction system with
          drift monitoring",
    infrastructure: SimShopInfra,
    agent_md: "./agents/simshop_dio.agent.md",
    provider: "openai",
    model: "gpt-4o",
    budget: DIOBudget
}

// ═══ EXECUTE ═══
let status = dio_status(SimShopDIO)
print(status)

// In Auto mode, you would call:
// let result = dio_solve(SimShopDIO, task)
// print(result)

Let us walk through the key decisions in this declaration:

Budget separation. The DIO has a $500 budget. Each sub-agent has a $50 budget. This means the DIO cannot spend more than $500 across all agent activations, and no single agent can exceed $50. Budget enforcement is a first-class runtime concern -- the Neam VM tracks token usage and cost in real-time and halts execution if a budget is exhausted.

Infrastructure profile. The SimShopInfra profile declares PostgreSQL as the data warehouse, MLflow as the model registry, and GDPR as the regulatory framework. Every agent receives this profile and adapts its behavior accordingly. The ETL Agent generates PostgreSQL-dialect SQL. The Governance Agent checks for GDPR-relevant PII columns.

Agent.MD reference. Both the DIO and the Data-BA Agent reference Agent.MD files. These files contain structured domain knowledge -- known data issues, delegation rules, methodology preferences. Chapter 7 covers Agent.MD in depth.

Mode: config. This declaration uses Config mode, meaning the user has explicitly declared which agents participate. In Auto mode, the DIO would form the crew dynamically based on the task description.

Try It

- Clone the DataSims repository, set up the Docker environment, and run neam-agents/programs/simshop_churn.neam. Watch the DIO coordinate seven agents through the churn prediction lifecycle. Inspect the dio_status() output to see crew assignments, RACI matrix, and execution progress.


The DIO State Machine #

The DIO's execution follows a formal state machine with well-defined transitions:

DIAGRAM DIO State Machine
stateDiagram-v2
  [*] --> INITIALIZED
  INITIALIZED --> PLANNING: dio_solve() called
  PLANNING --> DELEGATING: Task Understanding\n+ Crew Formation\n+ Pattern Selection
  DELEGATING --> EXECUTING: RACI Assignment\n+ Budget Allocation
  EXECUTING --> GATE: Agent Activation\n+ Progress Monitoring
  GATE --> EXECUTING: PASS (next agent)
  GATE --> RECOVERING: FAIL
  RECOVERING --> EXECUTING: Resolved (resume)
  RECOVERING --> ESCALATED: Unresolved
  ESCALATED --> [*]: Human Intervention
  EXECUTING --> SYNTHESIZING: All agents complete
  SYNTHESIZING --> COMPLETE: Result combination
  COMPLETE --> [*]: Final output delivered
State Machine Properties
  • Checkpointed — Can resume from any state
  • Logged — Full audit trail
  • Budget-tracked — Cost accumulated at every transition

The state machine is not just a diagram -- it is implemented as ObjDIOStateMachine in the Neam VM, with persistence to disk at every checkpoint. If the process crashes at the EXECUTING state (say, after the DataScientist Agent completes but before the Causal Agent starts), it can resume from exactly that checkpoint without re-running the DataScientist Agent.


RACI in Practice #

For the churn prediction task, the DIO generates this RACI matrix:

Sub-TaskR(esponsible)A(ccountable)C(onsulted)I(nformed)
RequirementsData-BADIOGovernanceAll
Feature EngineeringETL AgentDIOData-BADataOps
PII ComplianceGovernanceDIOData-BAAll
Model TrainingDataScientistDIOData-BAMLOps
Causal AnalysisCausalDIODataScientistData-BA
Quality ValidationDataTestDIODataScientistAll
DeploymentMLOpsDIODataTestDataOps
Monitoring SetupMLOpsDIODataOpsAll
RACI Rules
  • Every sub-task has exactly one R (who does the work)
  • DIO is always A (ultimately accountable for outcomes)
  • C agents provide input but do not do the work
  • I agents receive results but have no action items

Notice that the DIO is Accountable for every sub-task. This is by design. If a sub-task fails, the DIO is responsible for error recovery -- not the failing agent. The DIO decides whether to retry, fallback, degrade, or escalate.

Anti-Pattern

- Assigning multiple agents as Responsible for the same sub-task. This creates the "two people carrying a couch through a doorway" problem -- both agents produce overlapping artifacts, neither knows which is authoritative, and the downstream consumer gets confused. One R per sub-task, always.


Error Recovery in Depth #

The DIO's error handling follows a four-level escalation strategy:

DIAGRAM Error Recovery Escalation Levels
flowchart TD
  ERR["Error Detected"] --> L1["Level 1: RETRY\nSame agent, same task\nMax 3 retries"]
  L1 -->|"Success"| RESUME["Resume Execution"]
  L1 -->|"Exhausted"| L2["Level 2: FALLBACK\nDifferent approach\nor alternative agent"]
  L2 -->|"Success"| RESUME
  L2 -->|"Failed"| L3["Level 3: GRACEFUL DEGRADATION\nSkip non-critical sub-task\nFlag as incomplete"]
  L3 -->|"Non-blocking"| RESUME
  L3 -->|"Critical"| L4["Level 4: HUMAN ESCALATION\nPause + dump context\nError logs, states, suggestions"]
  L4 --> HUMAN["Human Intervention"]
Insight

- The four-level escalation is designed around a principle: exhaust automated options before involving humans, but involve humans before producing incorrect results. A system that never escalates is dangerous. A system that always escalates is useless. The levels encode the right tradeoff.


Industry Perspective #

The DIO pattern draws from three established coordination frameworks:

ITIL Service Management defines incident management with escalation tiers (L1/L2/L3) -- the same pattern the DIO uses for error recovery. The difference is that ITIL relies on human judgment at each tier; the DIO automates L1 and L2.

SAFe (Scaled Agile Framework) uses a "Release Train Engineer" who coordinates multiple agile teams without doing the work herself. The DIO fills an analogous role: it coordinates agents without writing SQL, training models, or running tests.

Project Management (PMBOK 7th Edition) defines the project manager as accountable for outcomes while team members are responsible for deliverables. The DIO's RACI model directly mirrors this accountability structure.

The key innovation is that these coordination patterns, traditionally executed by experienced humans over weeks, are compressed into automated orchestration that executes in hours. The patterns are the same; the execution speed is different by orders of magnitude.


The Evidence #

DataSims experiments tested the DIO across multiple conditions:

Ablation A1 (No DIO). Agents were given the same task but without the DIO coordinating them. Each agent attempted to self-organize through direct communication. Result: task completion dropped from 100% to 45%. Without centralized orchestration, agents duplicated work (the ETL Agent and DataScientist both tried to build feature tables), missed dependencies (the DataTest Agent ran before the model was trained), and produced inconsistent artifacts (two different schemas for the same feature table).

Mode Comparison. Auto mode completed the churn prediction task in 3.2 hours average. Config mode completed in 4.1 hours (the user's manual configuration was slightly suboptimal). Hybrid mode completed in 4.8 hours (human review added latency at checkpoints but provided the highest confidence in results). All three modes achieved the same final AUC (0.847), confirming that the modes affect speed, not quality.

Pattern Coverage. Across 50 experimental runs covering all 5 problem statements, the DIO correctly selected the appropriate auto-pattern 48 out of 50 times (96% accuracy). The two misclassifications involved edge cases where the task description was ambiguous; Hybrid mode caught both misclassifications at the approval checkpoint.

Error Recovery. In 50 runs, 12 required error recovery (24% of runs). Level 1 (retry) resolved 8. Level 2 (fallback) resolved 3. Level 3 (graceful degradation) resolved 1. Level 4 (human escalation) was never triggered. The DIO's automated recovery handled all errors without human intervention.


Key Takeaways #

For Further Exploration #