Chapter 5: Meet the 14 Agents #
"The strength of the team is each individual member. The strength of each member is the team." -- Phil Jackson
30 min read | All personas | Part II: The Architecture
What you'll learn:
- The role, personality, authority, and capabilities of each of the 14 specialist agents
- The Agent Capability Matrix -- which agents produce, consume, reason, and validate
- The trait-based capability system: DataProducer, DataConsumer, CausalReasoner, QualityGatekeeper
- The composable agent pattern -- how agents combine into crews for complex tasks
The Problem #
Marcus is a data scientist at a mid-size e-commerce company. He has been asked to build a churn prediction model. He knows how to train an XGBoost classifier. He knows how to compute SHAP values. What he does not know is where the customer data lives, whether it has PII that needs masking, what the business definition of "churn" actually is, how to build the feature pipeline, how to validate data quality, how to deploy the model to production, or how to monitor it for drift after launch.
Marcus is not incompetent. He is specialized. And that is the fundamental problem with modern data organizations: the work requires 7-10 distinct specializations, but most teams have 3-4 people who each cover 2-3 areas with varying depth. The gaps between specializations are where projects fail.
The Intelligent Data Organization does not replace Marcus. It gives him 13 colleagues who are always available, never context-switch to other projects, and communicate through structured artifacts rather than Slack messages that disappear into the scroll.
Here are those 13 colleagues, plus the orchestrator who coordinates them all.
Agent Profile Cards #
Each agent below is presented as a profile card covering its essential characteristics. These are not theoretical descriptions -- they correspond directly to agent type declarations in the Neam language and their C++ runtime implementations.
Agent 1: Data Agent #
Role: Source & Schema Manager | Neam type: data agent { ... }
Personality: Meticulous, contract-oriented, defensive
Authority: Controls all data ingestion points
Key Capabilities:
- Declare typed schema contracts with version tracking
- Register sources (PostgreSQL, S3, Kafka, REST APIs)
- Configure sinks with write modes and batching
- Define quality gates (freshness, completeness, uniqueness)
- Route computation to appropriate engines (Spark, DuckDB)
- Track lineage from source to destination
Activated When: New data source onboarding · Schema contract definition or update · Quality gate configuration
Produces: Schema contracts, source registrations, quality gate configs, lineage metadata
Consumes: Infrastructure profiles, governance policies
Traits: DataProducer, QualityGatekeeper
Agent 2: ETL Agent #
Role: SQL-First Warehouse Builder | Neam type: etl agent { ... }
Personality: Methodical, SQL-native, transformation-focused
Authority: Controls warehouse schema and data loading
Key Capabilities:
- Dimensional modeling (Kimball star, Inmon, Data Vault)
- SQL-first transformations with multi-dialect transpilation
- SCD Type 1/2/3 handling for slowly changing dimensions
- Feature engineering from warehouse tables
- Semantic layer definitions
- Self-healing pipeline recovery
- Automatic lineage tracking at column level
Activated When: Staging → warehouse transformation · Feature table engineering · Schema change pipeline updates
Produces: Dimension tables, fact tables, feature tables, aggregate tables, SQL transformation scripts
Consumes: Schema contracts (Data Agent), BRDs (Data-BA), governance policies, infrastructure profiles
Traits: DataProducer, DataConsumer
Agent 3: Migration Agent #
Role: Zero-Downtime Platform Mover | Neam type: migration agent { ... }
Personality: Cautious, methodical, rollback-ready
Authority: Controls platform migration execution
Key Capabilities:
- Wave planning (prioritize tables by dependency graph)
- Schema translation across platforms (Oracle → Snowflake)
- Data type mapping with precision preservation
- Reconciliation (row counts, checksums, sampling)
- Cutover strategies (big-bang, trickle, dual-write)
- Rollback plans for every migration wave
Activated When: Moving between data platforms · Legacy decommissioning · Cloud migration (on-prem to cloud)
Produces: Migration plans, schema translation scripts, reconciliation reports, cutover runbooks
Consumes: Source schemas, target infrastructure profiles, governance constraints
Traits: DataProducer, DataConsumer
Agent 4: DataOps Agent #
Role: SRE for Data | Neam type: dataops agent { ... }
Personality: Vigilant, proactive, escalation-aware
Authority: Can restart, retry, and skip pipeline stages
Key Capabilities:
- Pipeline monitoring (latency, volume, error rates)
- Anomaly detection (statistical + ML-based)
- Cross-source correlation (identify cascading failures)
- Automated triage and root cause classification
- Guardrailed auto-heal (restart, retry, skip with limits)
- SLA tracking and reporting
- FinOps: cost tracking per pipeline, per agent
Activated When: Pipeline metrics exceed thresholds · SLA breach imminent or occurred · Cost anomaly in compute · DIO requests health status
Produces: Health reports, anomaly alerts, triage reports, SLA dashboards, cost breakdowns
Consumes: Pipeline metrics, infrastructure state, event bus alerts
Traits: QualityGatekeeper
Agent 5: Governance Agent #
Role: Compliance Officer | Neam type: governance agent { ... }
Personality: Strict, policy-driven, audit-minded
Authority: Can block data flows that violate policies
Key Capabilities:
- Data classification (PII, PHI, financial, public)
- Access policy enforcement (RBAC, ABAC)
- Column-level lineage tracking
- Regulatory compliance validation (GDPR, CCPA, DORA)
- Audit trail generation (every access logged)
- Data quality scoring per domain
- External tool connectors (Collibra, Atlas, Purview)
Activated When: New source contains PII/PHI · Regulatory audit requested · Access policy creation or validation · Cross-border data movement
Produces: Classification reports, access policies, lineage graphs, audit trails, compliance certificates
Consumes: Schema metadata, data catalogs, regulatory requirements, infrastructure profiles
Traits: QualityGatekeeper
Agent 6: Modeling Agent #
Role: Data Architect | Neam type: modeling agent { ... }
Personality: Analytical, pattern-seeking, standards-aware
Authority: Proposes schema changes (requires approval)
Key Capabilities:
- Schema reverse-engineering from existing databases
- ER model generation and visualization
- Normalization analysis (1NF through BCNF)
- Dimensional design (star schema, snowflake, vault)
- Schema amendment proposals with impact analysis
- Cross-schema dependency mapping
Activated When: New database needs architectural analysis · Schema change impact assessment · Normalization/denormalization consideration · Data model documentation
Produces: ER diagrams, normalization reports, dimensional designs, amendment proposals, dependency maps
Consumes: Schema metadata, data catalogs, existing models, business requirements
Traits: DataConsumer
Agent 7: Analyst Agent #
Role: NL-to-SQL Query Engine | Neam type: analyst agent { ... }
Personality: Responsive, dialect-aware, insight-oriented
Authority: Read-only access through governed channels
Key Capabilities:
- Natural language to SQL translation
- 9 SQL dialects (Postgres, Snowflake, BigQuery, Databricks SQL, Redshift, Oracle, Teradata, Trino, DuckDB)
- Platform-specific query optimization
- Governed execution (respects access policies)
- Multi-format output (Excel, PDF, HTML, Slack, JSON)
- Insight discovery and anomaly highlighting
Activated When: Business user needs ad-hoc analysis · DIO needs data exploration · Causal Agent needs observational data
Produces: Query results, formatted reports, data summaries, insight annotations
Consumes: Schema metadata, SQL connections, governance policies, natural language queries
Traits: DataConsumer
Agent 8: Data-BA Agent #
Role: Requirements Intelligence Analyst | Neam type: databa agent { ... }
Personality: Inquisitive, structured, traceability-obsessed
Authority: Defines what should be built (not how)
Key Capabilities:
- LLM-assisted requirements elicitation
- Business Requirements Document (BRD) generation
- Given/When/Then acceptance criteria formulation
- Traceability matrix (requirement → implementation → test)
- Impact analysis for requirement changes
- BABOK v3 aligned elicitation techniques
- Stakeholder communication templates
Activated When: New data project initiated · Business requirements need formalization · Requirement change impact assessment · Acceptance criteria definition
Produces: BRDs, acceptance criteria, traceability matrices, impact analysis reports, stakeholder summaries
Consumes: Business context, Agent.MD domain knowledge, existing project documentation
Traits: DataProducer
- The Data-BA Agent operates at "day minus one" of the data lifecycle. Before any pipeline is built, any model is trained, or any query is written, the Data-BA produces the BRD that defines what success looks like. This is the single most impactful architectural decision in the system: requirements before engineering.
Agent 9: DataScientist Agent #
Role: ML/AI Modeler | Neam type: datascientist agent { ... }
Personality: Experimental, hypothesis-driven, explainability-focused
Authority: Trains models, registers in MLflow
Key Capabilities:
- Problem framing (classification, regression, clustering)
- Hypothesis testing with statistical rigor
- EDA-driven technique selection
- Volume-aware compute routing: <100K rows → scikit-learn · 100K–10M → XGBoost/LightGBM · >10M → Spark ML
- Feature engineering and selection
- ML, DL, and NLP pipeline construction
- SHAP-based explainability for every model
Activated When: Prediction task defined in BRD · Feature table ready · Model retraining triggered by MLOps
Produces: Trained models (MLflow), SHAP values, EDA reports, feature importance rankings
Consumes: Feature tables, BRD acceptance criteria, compute profiles, Agent.MD preferences
Traits: DataConsumer, DataProducer
Agent 10: Causal Agent #
Role: Causal Reasoning Engine | Neam type: causal agent { ... }
Personality: Skeptical, rigorous, "correlation is not causation" embodied
Authority: Validates causal claims, proposes interventions
Key Capabilities:
- Pearl's Ladder of Causation:
- Rung 1: Association (observational, P(Y|X))
- Rung 2: Intervention (do-calculus, P(Y|do(X)))
- Rung 3: Counterfactual (P(Y_x|X',Y'))
- Structural Causal Model (SCM) construction
- Bayesian inference via PyMC
- Causal discovery via DoWhy
- Average Treatment Effect (ATE) estimation
- Counterfactual scenario generation
Activated When: Root cause analysis needed · Intervention impact estimation · "What if" scenarios · Revenue anomaly causal explanation
Produces: Causal DAGs, ATE estimates, counterfactual reports, intervention recommendations, SCM specifications
Consumes: Feature tables, model outputs, SHAP values, domain knowledge from Agent.MD
Traits: CausalReasoner, DataConsumer
- Using SHAP values as causal evidence. SHAP tells you which features were important to the model's prediction. It does not tell you which features cause the outcome. A model might weight days_since_last_order highly for churn prediction, but the Causal Agent reveals that support_ticket_resolution_time is the actual causal driver. Reducing resolution time causes reduced churn. Reducing days since last order is just chasing a symptom.
Agent 11: DataTest Agent #
Role: Independent Quality Validator | Neam type: datatest agent { ... }
Personality: Skeptical, adversarial, never rubber-stamps
Authority: Can block deployments via quality gate failures
Key Capabilities:
- Test generation from BRD acceptance criteria
- ETL validation (row counts, schema, referential integrity)
- Data warehouse consistency checks
- ML model validation (AUC, precision, recall thresholds)
- API endpoint testing
- Quality gates: blocking (must pass) vs advisory (warning)
- Test coverage reporting with traceability to requirements
Activated When: Artifact validation needed · Quality gate checkpoint reached · Deployment approval requested
Produces: Test reports, quality gate verdicts (PASS/FAIL), coverage metrics, defect lists
Consumes: BRD acceptance criteria, feature tables, trained models, API endpoints, ETL outputs
Traits: QualityGatekeeper
- The DataTest Agent is architecturally separated from all builder agents. The agent that trains the model cannot be the agent that validates it. This is not a code organization choice; it is a trust boundary. In the Neam runtime, the DataTest Agent has read-only access to artifacts produced by other agents. It cannot modify them, only judge them.
Agent 12: MLOps Agent #
Role: Production ML Guardian | Neam type: mlops agent { ... }
Personality: Operationally cautious, metrics-obsessed
Authority: Controls model deployment and rollback
Key Capabilities:
- 6 types of drift detection:
- Data drift (feature distribution shift)
- Concept drift (target relationship change)
- Prediction drift (output distribution change)
- Feature drift (individual feature shifts)
- Label drift (ground truth distribution change)
- Upstream drift (source data pattern change)
- Deployment strategies (canary, shadow, blue-green, A/B)
- Champion-challenger model management
- Automated retraining triggers
- Serving infrastructure (Flask, FastAPI, SageMaker)
Activated When: Model passes quality gates · Drift thresholds exceeded · Champion underperforming challenger · Scheduled retraining window
Produces: Deployment configs, drift reports, model serving endpoints, retraining triggers, A/B test results
Consumes: Validated models, quality gate results, prediction logs, monitoring metrics
Traits: QualityGatekeeper, DataConsumer
Agent 13: DIO (Data Intelligent Orchestrator) #
Role: Multi-Agent Coordinator | Neam type: dio agent { ... }
Personality: Strategic, delegation-focused, accountability-driven
Authority: Can activate any agent, assign RACI roles, allocate budgets
Key Capabilities:
- Task understanding (intent classification + decomposition)
- Crew formation (scored selection of agent subsets)
- Pattern selection (8 auto-patterns for common workflows)
- RACI delegation (Responsible, Accountable, Consulted, Informed for every sub-task)
- Execution management (sequential, parallel, conditional)
- State machine with checkpoint/rewind
- Error recovery (retry → fallback → escalation)
- Result synthesis (combine all agent outputs)
Activated When: Any data task is submitted · Always active as the entry point for all orchestrated work
Produces: Execution plans, RACI matrices, crew assignments, synthesized results, activity logs
Consumes: Task descriptions, Agent.MD domain knowledge, infrastructure profiles, agent status reports
Traits: (Orchestrator — unique role, no data traits)
The DIO is covered in depth in Chapter 6.
Agent 14: Deploy Agent #
Role: Infrastructure Deployment Manager | Neam type: deploy { ... }
Personality: DevOps-native, infrastructure-as-code oriented
Authority: Provisions and tears down compute resources
Key Capabilities:
- Container deployment (Docker, Kubernetes)
- Serverless deployment (Lambda, Cloud Run, Azure Functions)
- Infrastructure-as-Code generation (Terraform, CloudFormation)
- Multi-cloud targeting from single Neam program
- Health checks and readiness probes
- Rolling updates with automatic rollback
Activated When: Model needs production serving infrastructure · Pipeline needs scheduled compute · Infrastructure changes for scaling
Produces: Deployment manifests, Terraform plans, container configs, health check endpoints
Consumes: Infrastructure profiles, model artifacts, deployment strategies from MLOps Agent
Traits: DataProducer
The Agent Capability Matrix #
With all 14 agents introduced, here is how their capabilities map across the data lifecycle:
| Agent | Requirements | Ingest Data | Transform | Model/Train | Governance | Monitor/Test | Deploy/Serve |
|---|---|---|---|---|---|---|---|
| Data Agent | Primary | Supporting | |||||
| ETL Agent | Supporting | Primary | |||||
| Migration Agent | Primary | Primary | |||||
| DataOps Agent | Supporting | Primary | |||||
| Governance Agent | Primary | ||||||
| Modeling Agent | Supporting | ||||||
| Analyst Agent | Supporting | ||||||
| Data-BA Agent | Primary | ||||||
| DataScientist Agent | Supporting | Primary | |||||
| Causal Agent | Primary | ||||||
| DataTest Agent | Primary | ||||||
| MLOps Agent | Primary | Primary | |||||
| Deploy Agent | Primary | ||||||
| DIO | Supporting | Supporting | Supporting | Supporting | Supporting | Supporting | Supporting |
The Trait-Based Capability System #
Agents are not categorized by arbitrary labels. They implement traits -- composable capability markers that define what an agent can do in the system. Four traits form the foundation:
DataProducer #
Agents with the DataProducer trait create new data artifacts: tables, files, models, reports. The Data Agent produces schema registrations. The ETL Agent produces dimension and fact tables. The DataScientist Agent produces trained models. The Data-BA Agent produces BRDs.
DataConsumer #
Agents with the DataConsumer trait read data artifacts produced by others. The ETL Agent consumes schema contracts from the Data Agent. The DataScientist Agent consumes feature tables from the ETL Agent. The Causal Agent consumes model outputs and SHAP values from the DataScientist Agent.
CausalReasoner #
Only the Causal Agent holds this trait. It marks the ability to perform causal inference -- constructing SCMs, applying do-calculus, generating counterfactuals. This is not just another form of data analysis; it operates on a fundamentally different rung of Pearl's Ladder of Causation.
QualityGatekeeper #
Agents with the QualityGatekeeper trait can block downstream progress. The Data Agent blocks ingestion if quality gates fail. The Governance Agent blocks data flows that violate compliance policies. The DataTest Agent blocks deployment if tests fail. The DataOps Agent blocks operations if SLA breaches are detected. The MLOps Agent blocks serving if drift exceeds thresholds.
| Agent | DataProducer | DataConsumer | CausalReasoner | QualityGatekeeper |
|---|---|---|---|---|
| Data Agent | X | X | ||
| ETL Agent | X | X | ||
| Migration Agent | X | X | ||
| DataOps Agent | X | |||
| Governance Agent | X | |||
| Modeling Agent | X | |||
| Analyst Agent | X | |||
| Data-BA Agent | X | |||
| DataScientist | X | X | ||
| Causal Agent | X | X | ||
| DataTest Agent | X | |||
| MLOps Agent | X | X | ||
| Deploy Agent | X | |||
| DIO | Orchestrator — coordinates all traits | |||
- Traits are not mutually exclusive. The DataScientist Agent is both a DataProducer (it creates models) and a DataConsumer (it reads feature tables). The MLOps Agent is both a DataConsumer (it reads model outputs) and a QualityGatekeeper (it blocks deployment on drift). This composability is what makes agents flexible enough to participate in different crew configurations.
The Composable Agent Pattern #
Not every task needs all 14 agents. The DIO selects a subset -- a "crew" -- based on the task requirements. Here are common crew compositions:
| Task | Crew | Size |
|---|---|---|
| Churn Prediction (Full Lifecycle) | Data-BA, ETL, DataScientist, Causal, DataTest, MLOps, Governance | 7 + DIO |
| Ad-Hoc Business Analysis | Analyst, Governance | 2 + DIO |
| Data Quality Audit | DataOps, DataTest, Governance | 3 + DIO |
| Platform Migration | Migration, Data Agent, Modeling, DataTest, Governance | 5 + DIO |
| Revenue Anomaly Investigation | Analyst, Causal, DataScientist | 3 + DIO |
| GDPR Compliance Audit | Governance, DataTest, Data-BA | 3 + DIO |
| Pipeline Failure Investigation | DataOps, Causal, ETL | 3 + DIO |
| Model Retraining | DataScientist, DataTest, MLOps | 3 + DIO |
The crew formation algorithm scores each agent on four dimensions: capability match (40%), cost efficiency (20%), infrastructure compatibility (20%), and historical performance (20%). Chapter 6 details this scoring system.
- Using the DataSims environment, run the GDPR compliance audit (Problem Statement 5). Notice how the DIO forms a crew of only 3 agents, skipping the DataScientist, MLOps, and ETL agents entirely. The crew is task-appropriate, not task-maximal.
Agent Interaction Patterns #
Agents do not interact freely. The DIO mediates all interactions through three patterns:
Pattern 1: Sequential Handoff
Used when output of one agent is input to the next (e.g., Requirements → Engineering → Modeling).
flowchart LR BA["Data-BA"] -->|"BRD"| ETL["ETL Agent"] ETL -->|"features"| DS["DataScientist"] DS -->|"model"| OUT["Output"]
Pattern 2: Parallel Execution
Used when tasks are independent and can run concurrently (e.g., Predictive + Causal + Descriptive analysis).
flowchart LR DIO["DIO"] --> DS["DataScientist"] DIO --> CA["Causal Agent"] DIO --> AN["Analyst Agent"] DS --> M["model"] CA --> D["DAG"] AN --> R["report"]
Pattern 3: Gate-Blocked Progression
Used when a quality gate must pass before the next stage (e.g., Model validation before deployment).
flowchart LR DS["DataScientist"] --> DT["DataTest"] DT -->|"PASS"| ML["MLOps"] DT -->|"FAIL"| DS
Industry Perspective #
The 14-agent taxonomy maps to real organizational roles in data teams. A typical enterprise data organization has these roles, often filled by the same person wearing multiple hats:
| Agent | Traditional Role | Typical Headcount |
|---|---|---|
| Data-BA | Business Analyst | 1-2 per project |
| Data Agent + ETL | Data Engineer | 2-4 per team |
| Migration | Platform Engineer | 0-1 (project-based) |
| DataOps | DataOps / SRE | 1-2 per organization |
| Governance | Data Steward / DPO | 1-2 per organization |
| Modeling | Data Architect | 0-1 per organization |
| Analyst | Data Analyst | 2-5 per team |
| DataScientist | Data Scientist | 1-3 per team |
| Causal | (rarely exists) | 0 in most orgs |
| DataTest | QA Engineer | 0-1 per team |
| MLOps | ML Engineer | 1-2 per team |
| Deploy | DevOps | 1-2 shared |
| DIO | Project Manager | 1 per project |
Total headcount for a full-lifecycle data project: 12-25 people across an organization. The Intelligent Data Organization does not eliminate these roles -- it augments them. A team of 3-4 people can leverage 14 agents to cover the entire lifecycle without gaps.
The Causal Agent fills a notable gap. In the table above, most organizations have zero people dedicated to causal reasoning. Correlational analysis is the default. The Causal Agent brings Pearl's framework to every project, whether or not the organization has a causal inference specialist.
The Evidence #
DataSims ablation experiments systematically removed individual agents to measure their impact:
| Ablation | Agent(s) Removed | Impact on Full System |
|---|---|---|
| A1 | DIO (orchestrator) | Task completion: 100% -> 45% |
| A2 | Data-BA (requirements) | Traceability: 95% -> 22% |
| A3 | DataScientist (modeling) | No prediction capability |
| A4 | Causal (why analysis) | Root cause: "support_quality" -> "unknown" |
| A5 | DataTest (validation) | Test coverage: 94% -> 0%, silent failures |
| A6 | Agent.MD (domain knowledge) | AUC: 0.847 -> 0.782 (-7.7%, p<0.01) |
| A7 | MLOps (production ops) | No drift detection, no deployment |
| A8 | RACI (accountability) | Traceability loss: 80% |
Every agent matters. Removing any single agent degrades the system measurably. This is the empirical foundation for the 14-agent architecture -- not theoretical elegance, but measured necessity.
Key Takeaways #
- The 14 agents cover the complete data lifecycle from requirements to production monitoring, with no gaps in coverage.
- Each agent has a defined role, authority level, and set of capabilities -- they are specialists, not generalists.
- Four traits (DataProducer, DataConsumer, CausalReasoner, QualityGatekeeper) provide a composable capability system that the DIO uses for crew formation.
- Not every task needs all 14 agents. The composable agent pattern allows the DIO to form right-sized crews of 2-7 agents based on task requirements.
- The Causal Agent fills a critical gap that exists in most data organizations: the ability to answer "why" rather than just "what."
- DataSims ablation experiments confirm that removing any single agent degrades system performance measurably.
For Further Exploration #
- Neam Agent Declarations -- Syntax reference for all agent types
- DataSims Repository -- See the agents in action on the SimShop platform
- Wooldridge, M. (2009). An Introduction to MultiAgent Systems -- Academic foundation for multi-agent coordination