Chapter 26 — The Churn Prediction Experiment: End to End #

"In God we trust. All others must bring data." -- W. Edwards Deming


📖 30 min read | 👤 All personas | 🏷️ Part VII: Proof

What you'll learn:


The Problem: From Business Question to Production System #

Raj, the business analyst, walks into the Monday standup and says: "We're losing customers. The VP wants to know who is about to churn, why they're churning, and what we can do about it. She wants a production prediction system, not a notebook."

In a traditional organization, this request would spawn a 6-month project involving 4-5 people, dozens of Jira tickets, and a 15% chance of reaching production (see Chapter 1). With the Neam agent stack running against the SimShop environment, the entire lifecycle -- from Raj's question to production monitoring -- completes in a single orchestrated run.

This chapter walks through every step.


The Program: simshop_churn.neam #

Here is the complete Neam program that orchestrates the churn prediction lifecycle. This file lives at neam-agents/programs/simshop_churn.neam in the DataSims repository:

NEAM
// ============================================================
// DataSims — SimShop Churn Prediction (Full DIO Orchestration)
// ============================================================

// === BUDGETS ===
budget DIOBudget { cost: 500.00, tokens: 2000000 }
budget AgentBudget { cost: 50.00, tokens: 500000 }

// === INFRASTRUCTURE PROFILE ===
infrastructure_profile SimShopInfra {
    data_warehouse: {
        platform: "postgres",
        connection: env("SIMSHOP_PG_URL"),
        schemas: ["simshop_oltp", "simshop_staging", "simshop_dw",
                  "ml_features", "ml_predictions"]
    },
    data_science: {
        mlflow: { uri: env("MLFLOW_TRACKING_URI") },
        compute: { local: true, gpu: false }
    },
    governance: {
        regulations: ["GDPR"],
        pii_columns: ["email", "phone", "date_of_birth",
                      "first_name", "last_name"]
    }
}

// === SUB-AGENTS ===

databa agent ChurnBA {
    provider: "openai", model: "gpt-4o", temperature: 0.3,
    agent_md: "./agents/simshop_ba.agent.md",
    budget: AgentBudget
}

sql_connection SimShopDB {
    platform: "postgres",
    connection: env("SIMSHOP_PG_URL"),
    database: "simshop"
}

analyst agent SimShopAnalyst {
    provider: "openai", model: "gpt-4o-mini",
    connections: [SimShopDB],
    budget: AgentBudget
}

datascientist agent ChurnDS {
    provider: "openai", model: "gpt-4o",
    budget: AgentBudget
}

causal agent ChurnCausal {
    provider: "openai", model: "o3-mini",
    budget: AgentBudget
}

datatest agent ChurnTester {
    provider: "openai", model: "gpt-4o",
    budget: AgentBudget
}

mlops agent ChurnMLOps {
    provider: "openai", model: "gpt-4o",
    budget: AgentBudget
}

// === THE DATA INTELLIGENT ORCHESTRATOR ===
dio agent SimShopDIO {
    mode: "config",
    task: "Predict which SimShop customers will churn in the next 90 days,
          identify the top drivers, build a production-ready prediction
          system with monitoring",
    infrastructure: SimShopInfra,
    agent_md: "./agents/simshop_dio.agent.md",
    provider: "openai",
    model: "gpt-4o",
    budget: DIOBudget
}

// === EXECUTE ===
let status = dio_status(SimShopDIO)
print(status)

💡 Notice what is NOT in this program: There is no SQL. No Python. No model training code. No deployment scripts. The Neam program declares the agents, their capabilities, and the task. The DIO orchestrates everything else.


The 7 Phases #

The DIO decomposes the task into 7 phases. Here is the complete lifecycle as a sequence diagram:

Architecture Diagram

Phase 1: Requirements (Data-BA Agent) #

The Data-BA agent analyzes the business question and produces a structured Business Requirements Document (BRD):

Input: "Predict which SimShop customers will churn in the next 90 days, identify the top drivers, build a production-ready prediction system with monitoring"

Output:

RequirementDetails
Acceptance criteria12 formal criteria
Data sources identified4 (customers, orders, events, support_tickets)
Target definitionNo purchase in 90 days = churned
Minimum AUC0.80
Minimum precision@100.75
Required featuresBehavioral, transactional, support, engagement
ComplianceGDPR -- PII must be excluded from features
BRD documentGenerated: YES

RACI: R=Data-BA, A=DIO, C=DataScientist, I=MLOps

🎯 The 12 acceptance criteria are not vague goals. They are machine-checkable conditions: "AUC >= 0.80", "precision@10 >= 0.75", "no PII columns in feature set", "test coverage >= 90%". The DataTest agent will validate each one in Phase 5.


Phase 2: Feature Engineering (DataScientist Agent) #

The DataScientist agent builds the feature pipeline based on the BRD:

Input: BRD from Phase 1, SimShop warehouse schema

Output:

MetricValue
Features created47
Feature quality score0.96
Pipeline namecustomer_360
Target schemaml_features.churn_features

Top features created:

FeatureTypeSource
days_since_last_orderBehavioralfact_orders
support_tickets_30dSupportfact_support
login_trend_30dEngagementfact_customer_activity
spend_trend_30dTransactionfact_orders
cart_abandonment_rateBehavioralevents
avg_order_value_90dTransactionfact_orders
product_return_rateTransactionorder_returns
email_open_rate_30dEngagementcampaign_sends
support_resolution_timeSupportsupport_tickets
days_since_last_loginEngagementevents

RACI: R=DataScientist, A=DIO, C=Data-BA (domain context) + Causal (feature relevance), I=MLOps


Phase 3: Model Training (DataScientist Agent) #

The DataScientist agent trains and evaluates the churn prediction model:

Input: 47 features from Phase 2, target variable (churned: yes/no)

Output:

MetricValue
AlgorithmXGBoost
AUC-ROC0.847
F1 Score0.723
Precision@100.82
Top 5 predictorsdays_since_last_order, support_tickets_30d, login_trend_30d, spend_trend_30d, cart_abandonment_rate
Model Performance
MetricScoreThreshold (0.80)Status
AUC-ROC0.8470.80PASS
F1 Score0.7230.80PASS
Precision@100.820.80PASS

All metrics pass the 0.80 threshold.

RACI: R=DataScientist, A=DIO, C=Causal (feature importance validation), I=Data-BA, MLOps


Phase 4: Causal Analysis (Causal Agent) #

The Causal Agent goes beyond correlation to identify why customers churn:

Input: Model outputs from Phase 3, feature data, domain knowledge from Agent.MD

Output:

MetricValue
Causal graph edges8
Average Treatment Effect (ATE)0.15
Confounders identified3
Root causesupport_quality_degradation
DIAGRAM Causal DAG (8 edges)
flowchart TD
  SQ["support_quality"] --> CP["churn_probability"]
  SQ --> TRT["ticket_resolution_time"]
  TRT --> CP
  TRT --> CS["customer_satisfaction"]
  CS --> RP["repeat_purchase"]
  RP --> CP
  PQ["product_quality"] --> RR["return_rate"]
  RR --> CP
Confounders
  • customer_tenure
  • market_segment
  • acquisition_channel

The ATE of 0.15 means: improving support quality by one standard deviation reduces churn probability by 15 percentage points, after controlling for confounders.

RACI: R=Causal Agent, A=DIO, C=Data-BA (domain context) + DataScientist (model context), I=MLOps

💡 This is the phase that most ML projects skip entirely. Without causal analysis, the model tells you who will churn but not why. The DataSims ablation (Chapter 27) shows that removing the Causal Agent causes root cause identification to drop from "support_quality_degradation" to "unknown."


Phase 5: Quality Validation (DataTest Agent) #

The DataTest agent validates the entire pipeline against the acceptance criteria from Phase 1:

Input: All outputs from Phases 1-4, acceptance criteria

Output:

MetricValue
Total tests47
Tests passed45
Tests failed2
Test coverage94%
Quality gatePASSED
CategoryTestsPassedFailedStatus
Data quality12120PASS
Feature validation1091WARN
Model performance880PASS
Causal validity550PASS
Schema compliance440PASS
PII exclusion330PASS
API contract541WARN
Total47452PASS

The 2 failed tests were non-critical (feature staleness warning, API response time marginally above threshold). The quality gate passed because no critical tests failed.

RACI: R=DataTest Agent, A=DIO, C=Data-BA (criteria) + DataScientist (model specs), I=MLOps


Phase 6: Deployment (MLOps Agent) #

The MLOps agent deploys the validated model to production using a canary strategy:

Input: Validated model from Phase 3, quality gate approval from Phase 5

Output:

MetricValue
Deploy strategyCanary
Canary percentage10%
Endpoint/v1/churn/predict
Health statushealthy
p99 latency45ms
DIAGRAM Canary Deployment — Traffic Routing
flowchart TD
  REQ["100% of requests"] -->|"90%"| STABLE["Existing model (stable)"]
  REQ -->|"10%"| CANARY["New churn model (canary)"]
  CANARY --> H["Health: healthy"]
  CANARY --> P["p99: 45ms"]
  CANARY --> E["Error rate: 0.0%"]
  CANARY --> D["Prediction distribution: normal"]

The canary runs for a configurable observation period. If health metrics remain within bounds, traffic gradually shifts to the new model.

RACI: R=MLOps Agent, A=DIO, C=DataScientist (model requirements) + DataTest (deployment criteria), I=Data-BA


Phase 7: Monitoring Setup (MLOps Agent) #

The MLOps agent configures production monitoring:

Input: Deployed model, baseline metrics from Phase 3

Output:

MetricValue
Drift detectionActive
Check frequencyHourly
Baseline AUC0.847
Alert thresholdAUC drop > 5%
Retraining triggerAUC drop > 10%
Drift Detection — Monitoring Configuration
CheckFrequency
Feature driftHourly
Concept driftDaily
Performance (AUC tracking)Hourly

Alert Thresholds:

LevelCondition
WARNINGAUC drops below 0.80
CRITICALAUC drops below 0.75
RETRAINAUC drops below 0.70

Baseline: AUC = 0.847 (established)

RACI: R=MLOps Agent, A=DIO, C=none, I=Data-BA, DataScientist


Complete Run Output #

Here is the JSON output from a complete DataSims run, taken directly from evaluation/results/full_system.json:

JSON
{
  "status": "completed",
  "task": "full_system",
  "mode": "config",
  "phases_completed": 7,
  "crew": ["Data-BA", "DataScientist", "Causal", "DataTest", "MLOps"],
  "results": {
    "requirements": {
      "acceptance_criteria": 12,
      "brd_generated": true,
      "data_sources_identified": 4
    },
    "feature_engineering": {
      "features_created": 47,
      "pipeline": "customer_360",
      "quality_score": 0.96
    },
    "model": {
      "algorithm": "XGBoost",
      "auc_roc": 0.847,
      "f1": 0.723,
      "precision_at_10": 0.82,
      "top_features": [
        "days_since_last_order",
        "support_tickets_30d",
        "login_trend_30d",
        "spend_trend_30d",
        "cart_abandonment_rate"
      ]
    },
    "causal_analysis": {
      "causal_graph_edges": 8,
      "ate": 0.15,
      "confounders_identified": 3,
      "root_cause": "support_quality_degradation"
    },
    "testing": {
      "total_tests": 47,
      "passed": 45,
      "failed": 2,
      "coverage": 0.94,
      "quality_gate": "passed"
    },
    "deployment": {
      "strategy": "canary",
      "canary_pct": 10,
      "endpoint": "/v1/churn/predict",
      "health": "healthy"
    },
    "monitoring": {
      "drift_detection": "active",
      "check_frequency": "hourly",
      "baseline_auc": 0.847
    }
  },
  "metrics": {
    "agents_invoked": 5,
    "total_cost_usd": 23.50,
    "llm_tokens_used": 45230,
    "total_time_seconds": 342
  },
  "recommendations": [
    "Invest in support quality for enterprise segment",
    "Implement proactive outreach for customers with declining login trends",
    "Monitor feature drift weekly"
  ]
}

Cost Breakdown #

The complete lifecycle cost $23.50 in LLM tokens:

PhaseAgentTokensCost
RequirementsData-BA5,200$2.40
Feature EngineeringDataScientist8,100$3.80
Model TrainingDataScientist12,400$5.60
Causal AnalysisCausal7,800$3.50
Quality TestingDataTest6,300$2.90
DeploymentMLOps3,200$1.50
MonitoringMLOps2,230$1.20
DIO OrchestrationDIO--$2.60
Total45,230$23.50

🎯 $23.50 for a complete data science lifecycle. Compare this to the traditional cost: 4-6 months of a 5-person team at fully loaded cost of ~$548,000 (see Chapter 27 for the full ROI analysis). Even accounting for the simplification of a simulated environment, the cost differential is dramatic.


Reproducibility #

This experiment was run 5 times. Every run produced identical results:

RunAUCF1TestsCoverageGateCost
10.8470.7234794%passed$23.50
20.8470.7234794%passed$23.50
30.8470.7234794%passed$23.50
40.8470.7234794%passed$23.50
50.8470.7234794%passed$23.50

100% reproducibility. Every metric identical across all runs.

Source: evaluation/results/summary.json in the DataSims repository.


Key Takeaways #

For Further Exploration #