Chapter 18 -- The Causal Agent: Understanding Why #

"Correlation is not causation -- but it sure is a hint." -- Edward Tufte


30 min read | Marcus (DS), Dr. Chen (Researcher), David (VP), Priya (DE) | Part V: Analytical Intelligence

What you'll learn:


The Problem: Knowing What Without Knowing Why #

Marcus's churn model works. AUC of 0.847. F1 of 0.723. Top features identified. The VP of Customer Success is pleased -- for exactly one meeting.

Then she asks the question that every predictive model dreads: "So we know who will churn. But why are they churning? If we improve our support response time, will churn actually decrease? Or is support just a symptom of something else?"

Marcus stares at his SHAP waterfall plot. support_tickets_30d is the second most important feature. But importance is not causation. Maybe unhappy customers both file support tickets and churn -- driven by a third factor (product quality decline) that the model does not directly observe. Intervening on support might do nothing. Or it might be the single most impactful lever. SHAP cannot tell the difference.

This is the boundary between prediction and understanding. Crossing it requires causal inference.


Pearl's Ladder of Causation #

Judea Pearl's Ladder of Causation (2018) describes three levels of causal reasoning. Most ML systems operate at Rung 1. The Causal Agent operates at all three.

Pearl's Ladder of Causation
Judea Pearl (2018) — Three levels of causal reasoning, from observation to imagination
3
Rung 3 — Most Powerful
Counterfactual
Imagining — "What if I had acted differently?"
P(Yx | X', Y') — Retrospective reasoning
The question: "Customer #4721 churned last month. Would they have stayed if we had called them proactively at the 15-day mark?"
Requires reasoning about a specific individual in a world that did not happen. Foundation of personalized intervention strategies.
Who operates here: Neam Causal Agent
Increasing Power
2
Rung 2
Intervention
Doing — "What if I do X?"
P(Y | do(X)) — do-calculus
The question: "If we set support quality to 'excellent', what happens to churn?" Not observing correlation — actively changing the cause and measuring the effect.
Uses do-calculus to compute causal effects while controlling for confounders. Enables intervention planning.
Who operates here: Neam Causal Agent + Randomized Controlled Trials
Increasing Difficulty
1
Rung 1 — Where Most ML Lives
Association
Seeing — "What if I observe X?"
P(Y | X) — Conditional probability
The question: "Customers who file >3 support tickets in 30 days have a 42% churn rate, vs 8% for those who file zero."
Correlation. Useful for prediction. Useless for intervention planning. Cannot distinguish cause from symptom.
Who operates here: Most ML models + SHAP / Feature importance

Rung 1: Association (Seeing) #

What most ML models do. "Customers who file more than 3 support tickets in 30 days have a 42% churn rate, compared to 8% for those who file zero." This is a correlation. Useful for prediction. Useless for intervention planning.

Rung 2: Intervention (Doing) #

The do-calculus question. "If we set support quality to 'excellent' (do(support_quality = excellent)), what happens to churn?" This is not the same as observing that support quality and churn are correlated. It asks what happens when we intervene -- when we change the cause and observe the effect, controlling for confounders.

Rung 3: Counterfactual (Imagining) #

The most powerful and demanding level. "Customer #4721 churned last month. Would they have stayed if we had called them proactively at the 15-day mark?" This requires reasoning about a specific individual in a world that did not happen. It is the foundation of personalized intervention strategies.

Key Insight: Most "AI-driven decision support" tools operate at Rung 1. They find patterns. The Causal Agent operates at Rungs 2 and 3 -- it identifies interventions that will change outcomes and counterfactuals that explain individual cases.


Structural Causal Models (SCM) #

The scm (Structural Causal Model) declaration defines the causal graph -- the DAG (Directed Acyclic Graph) that encodes which variables cause which other variables.

NEAM
scm ChurnSCM {
    variables: [
        { name: "product_quality", type: "continuous", role: "exogenous" },
        { name: "support_quality", type: "continuous", role: "endogenous" },
        { name: "customer_satisfaction", type: "continuous", role: "endogenous" },
        { name: "login_frequency", type: "continuous", role: "endogenous" },
        { name: "spend_trend", type: "continuous", role: "endogenous" },
        { name: "support_tickets", type: "count", role: "endogenous" },
        { name: "churn", type: "binary", role: "outcome" }
    ],
    edges: [
        { from: "product_quality", to: "customer_satisfaction" },
        { from: "product_quality", to: "support_tickets" },
        { from: "support_quality", to: "customer_satisfaction" },
        { from: "support_quality", to: "support_tickets" },
        { from: "customer_satisfaction", to: "login_frequency" },
        { from: "customer_satisfaction", to: "spend_trend" },
        { from: "login_frequency", to: "churn" },
        { from: "spend_trend", to: "churn" }
    ],
    confounders: [
        { variable: "product_quality",
          affects: ["support_tickets", "customer_satisfaction"] },
        { variable: "market_conditions",
          affects: ["spend_trend", "churn"], observed: false },
        { variable: "competitor_actions",
          affects: ["churn", "login_frequency"], observed: false }
    ]
}

The causal graph for SimShop churn:

CAUSAL GRAPH SimShop Churn
flowchart TD
  PQ["product_quality"] --> CS["customer_satisfaction"]
  PQ --> ST["support_tickets"]
  SQ["support_quality"] --> CS
  SQ --> ST
  CS --> LF["login_frequency"]
  CS --> SPT["spend_trend"]
  LF --> CHURN["CHURN"]
  SPT --> CHURN

  MC["market_conditions (unobserved)"] -.-> SPT
  MC -.-> CHURN
  CA["competitor_actions (unobserved)"] -.-> LF
  CA -.-> CHURN

Critical: The causal graph is a hypothesis, not a fact. It encodes domain expertise about which variables cause which. The Causal Agent can discover causal structure from data (using PC/FCI/GES algorithms), but domain expert review is always required before acting on the results.


Causal Discovery Algorithms #

When the causal graph is unknown, the Causal Agent can discover it from observational data using constraint-based and score-based algorithms.

NEAM
causal_discovery ChurnDiscovery {
    data: "ml_features.customer_360",
    algorithms: [
        {
            name: "PC",
            type: "constraint_based",
            alpha: 0.05,
            ci_test: "fisher_z"
        },
        {
            name: "FCI",
            type: "constraint_based",
            alpha: 0.05,
            handles_latent: true
        },
        {
            name: "GES",
            type: "score_based",
            score: "bic"
        },
        {
            name: "LLM_hybrid",
            type: "hybrid",
            model: "gpt-4o",
            prior_knowledge: ChurnSCM,
            data_driven: "GES",
            merge_strategy: "conservative"
        }
    ],
    consensus: {
        method: "majority_vote",
        min_agreement: 0.6,
        output: "consensus_graph"
    }
}

The LLM-hybrid approach is unique to Neam. It combines data-driven discovery (GES finds edges supported by data) with LLM-powered prior knowledge (GPT-4o suggests edges based on domain understanding). The conservative merge strategy only includes edges that both the data and the LLM agree on.


Intervention Analysis: The do() Operator #

The intervention declaration implements Pearl's do-calculus. It asks: "What would happen if we set a variable to a specific value?"

NEAM
intervention SupportIntervention {
    scm: ChurnSCM,
    treatment: {
        variable: "support_quality",
        do_value: "excellent",
        baseline_value: "current"
    },
    outcome: "churn",
    estimand: "ate",       // Average Treatment Effect
    methods: [
        {
            name: "backdoor_adjustment",
            covariates: ["product_quality", "customer_satisfaction"]
        },
        {
            name: "ipw",   // Inverse Probability Weighting
            propensity_model: "logistic_regression"
        },
        {
            name: "doubly_robust",
            outcome_model: "gradient_boosting",
            propensity_model: "logistic_regression"
        }
    ],
    sensitivity: {
        method: "e_value",
        confounding_strength: [1.5, 2.0, 3.0]
    }
}

Estimation Types:

EstimandQuestionGranularity
ATE (Average Treatment Effect)"On average, how much does improving support reduce churn?"Population
CATE (Conditional ATE)"How much does support improvement reduce churn for enterprise customers?"Subgroup
ITE (Individual Treatment Effect)"How much would support improvement reduce churn for customer #4721?"Individual

In the SimShop experiment, the ATE of improving support quality on churn was 0.15 -- meaning that improving support quality would reduce churn probability by 15 percentage points on average.


Counterfactual Analysis: Rung 3 #

NEAM
counterfactual ChurnCounterfactual {
    scm: ChurnSCM,
    question: "Would customer #4721 have stayed if support quality
               had been 'excellent' instead of 'poor'?",
    factual: {
        customer_id: 4721,
        support_quality: "poor",
        outcome: "churned"
    },
    counterfactual: {
        support_quality: "excellent"
    },
    method: "abduction_action_prediction",
    confidence: true
}

Counterfactual reasoning requires three steps:

  1. Abduction -- Given what we observed (customer #4721 churned with poor support), infer the latent factors
  2. Action -- In the counterfactual world, set support_quality to "excellent"
  3. Prediction -- With the inferred latent factors and the intervened variable, predict the outcome

Bayesian Inference with PyMC #

For probabilistic causal models, the Causal Agent integrates with PyMC for Bayesian inference.

NEAM
bayesian_model ChurnBayesian {
    scm: ChurnSCM,
    framework: "pymc",
    priors: {
        support_effect: { distribution: "normal", mu: 0, sigma: 1 },
        product_effect: { distribution: "normal", mu: 0, sigma: 1 },
        baseline_churn: { distribution: "beta", alpha: 2, beta: 18 }
    },
    inference: {
        method: "nuts",           // No-U-Turn Sampler
        chains: 4,
        draws: 2000,
        tune: 1000
    },
    diagnostics: {
        rhat_threshold: 1.01,
        ess_threshold: 400,
        divergences_max: 0
    }
}

The Composable Primitive Pattern #

The Causal Agent is designed as a composable primitive -- a service that any other agent can invoke when it needs to understand why something happened, not just what happened.

ARCHITECTURE The Composable Primitive Pattern
flowchart LR
  DO["DataOps Agent"] -- "RCA: Why did pipeline fail?" --> CA["CAUSAL AGENT\n(composable primitive)"]
  DS["DataScientist Agent"] -- "Validation: Is this feature causal?" --> CA
  ML["MLOps Agent"] -- "Drift: Why did the model degrade?" --> CA
NEAM
// DataOps Agent invoking Causal for root cause analysis
let pipeline_rca = causal_analyze(ChurnCausal, {
    question: "Why did the feature pipeline fail at 3 AM?",
    data: pipeline_logs,
    scm: PipelineFailureSCM
})

// DataScientist Agent validating a feature
let feature_validity = causal_analyze(ChurnCausal, {
    question: "Is support_tickets_30d a cause of churn
               or merely correlated via product_quality?",
    data: feature_table,
    scm: ChurnSCM,
    test: "conditional_independence"
})

// MLOps Agent diagnosing drift
let drift_cause = causal_analyze(ChurnCausal, {
    question: "Why did model AUC drop from 0.847 to 0.79?",
    data: monitoring_logs,
    scm: DriftSCM,
    hypothesis: ["data_drift", "concept_drift", "schema_change"]
})

Design Principle: The Causal Agent is NOT a specialist that only the DataScientist uses. It is a general-purpose reasoning engine that any agent invokes when it needs to move beyond correlation to causation. This makes causal reasoning a cross-cutting concern, not a siloed capability.


The Complete Causal Agent Declaration #

NEAM
// ═══ BUDGET ═══
budget CausalBudget { cost: 50.00, tokens: 500000 }

// ═══ CAUSAL AGENT ═══
causal agent ChurnCausal {
    provider: "openai",
    model: "o3-mini",
    budget: CausalBudget
}

// ═══ FULL CAUSAL WORKFLOW ═══
// Step 1: Discover causal structure
let graph = causal_discover(ChurnCausal, ChurnDiscovery)

// Step 2: Estimate treatment effects
let ate_result = causal_intervene(ChurnCausal, SupportIntervention)
print("ATE of support improvement on churn: " + str(ate_result.ate))

// Step 3: Counterfactual reasoning
let cf_result = causal_counterfactual(ChurnCausal, ChurnCounterfactual)
print("Would customer #4721 have stayed? " + str(cf_result.outcome))

// Step 4: Generate business recommendations
let recommendations = causal_recommend(ChurnCausal, {
    interventions: [SupportIntervention],
    budget_constraint: 100000,
    time_horizon: "6_months"
})
print(recommendations)

Industry Perspective #

Causal inference is experiencing a renaissance in industry. Microsoft's DoWhy library, Uber's CausalML, and Google's CausalImpact have brought academic causal methods into production systems. The EU AI Act (2024) explicitly requires that high-risk AI systems provide causal explanations, not just correlations.

But adoption remains limited. A 2024 survey by Towards Data Science found that only 12% of data science teams regularly use causal inference methods. The primary barrier is complexity: causal inference requires domain expertise in graph specification, familiarity with econometric estimators, and careful sensitivity analysis.

The Neam Causal Agent lowers this barrier by encoding causal methods as declarative specifications. A data scientist who can write a problem_statement can also write an scm and an intervention. The agent handles the methodological complexity -- selecting appropriate estimators, running sensitivity analysis, checking identification assumptions -- while the human provides the domain knowledge encoded in the causal graph.

Pearl himself has argued (2019) that "the most important challenge facing causality research is software." The Neam Causal Agent is a step toward meeting that challenge.


Evidence: DataSims Experimental Proof #

Experiment: Ablation A2 -- System Without Causal Agent #

Setup: The full SimShop churn prediction workflow was run 5 times with the Causal Agent disabled (ablation no_causal). All other agents remained active.

Results:

MetricFull SystemWithout CausalDelta
Root Causesupport_quality_degradationunknownLost
ATE0.150-100%
Causal Graph Edges80-100%
Confounders Identified33No change
Model AUC0.8470.847No change
Quality GatepassedpassedNo change

Analysis:

Without the Causal Agent, the system still builds an excellent predictive model (AUC=0.847). It still identifies the features that are associated with churn. But it cannot answer the VP's question: why are customers churning?

CODE
WITH CAUSAL AGENT:

  "Customers are churning because support quality has degraded.
   Improving support quality would reduce churn by 15 percentage
   points (ATE=0.15). The effect is strongest for enterprise
   customers in the technology segment (CATE=0.22)."

WITHOUT CAUSAL AGENT:

  "Customers who file many support tickets tend to churn.
   Root cause: unknown."

The difference is the difference between actionable intelligence and a correlation table.

Key Finding: The Causal Agent does not improve predictive accuracy. It provides explanatory depth. Without it, the system can predict who will churn but cannot explain why or recommend what to do about it. In the SimShop experiment, this meant the difference between a targeted "improve support quality for enterprise customers" recommendation and a generic "monitor churn" non-recommendation.

Reproducibility: 5/5 runs succeeded. Results are deterministic. Full data available at github.com/neam-lang/Data-Sims in evaluation/results/ablation_no_causal.json.


Key Takeaways #