Chapter 18 -- The Causal Agent: Understanding Why #

"Correlation is not causation -- but it sure is a hint." -- Edward Tufte

30 min read | Marcus (DS), Dr. Chen (Researcher), David (VP), Priya (DE) | Part V: Analytical Intelligence

What you'll learn:

Pearl's Ladder of Causation and why predictive models alone are not enough
Rung 1 (association), Rung 2 (intervention), and Rung 3 (counterfactual) reasoning
Structural Causal Models (SCM) as first-class Neam declarations
Bayesian inference with PyMC integration
Causal discovery algorithms: PC, FCI, GES, and LLM-hybrid approaches
ATE, CATE, and ITE estimation for treatment effect analysis
The composable primitive pattern: ANY agent can invoke the Causal Agent
DataSims proof: ablation A2 shows root cause degrades from "support_quality_degradation" to "unknown" without the Causal Agent

The Problem: Knowing What Without Knowing Why #

Marcus's churn model works. AUC of 0.847. F1 of 0.723. Top features identified. The VP of Customer Success is pleased -- for exactly one meeting.

Then she asks the question that every predictive model dreads: "So we know who will churn. But why are they churning? If we improve our support response time, will churn actually decrease? Or is support just a symptom of something else?"

Marcus stares at his SHAP waterfall plot. support_tickets_30d is the second most important feature. But importance is not causation. Maybe unhappy customers both file support tickets and churn -- driven by a third factor (product quality decline) that the model does not directly observe. Intervening on support might do nothing. Or it might be the single most impactful lever. SHAP cannot tell the difference.

This is the boundary between prediction and understanding. Crossing it requires causal inference.

Pearl's Ladder of Causation #

Judea Pearl's Ladder of Causation (2018) describes three levels of causal reasoning. Most ML systems operate at Rung 1. The Causal Agent operates at all three.

Pearl's Ladder of Causation

Judea Pearl (2018) — Three levels of causal reasoning, from observation to imagination

Rung 3 — Most Powerful

Counterfactual

Imagining — "What if I had acted differently?"

P(Y_x | X', Y') — Retrospective reasoning

The question: "Customer #4721 churned last month. Would they have stayed if we had called them proactively at the 15-day mark?"

Requires reasoning about a specific individual in a world that did not happen. Foundation of personalized intervention strategies.

Who operates here: Neam Causal Agent

Increasing Power

Rung 2

Intervention

Doing — "What if I do X?"

P(Y | do(X)) — do-calculus

The question: "If we set support quality to 'excellent', what happens to churn?" Not observing correlation — actively changing the cause and measuring the effect.

Uses do-calculus to compute causal effects while controlling for confounders. Enables intervention planning.

Who operates here: Neam Causal Agent + Randomized Controlled Trials

Increasing Difficulty

Rung 1 — Where Most ML Lives

Association

Seeing — "What if I observe X?"

P(Y | X) — Conditional probability

The question: "Customers who file >3 support tickets in 30 days have a 42% churn rate, vs 8% for those who file zero."

Correlation. Useful for prediction. Useless for intervention planning. Cannot distinguish cause from symptom.

Who operates here: Most ML models + SHAP / Feature importance

Rung 1: Association (Seeing) #

What most ML models do. "Customers who file more than 3 support tickets in 30 days have a 42% churn rate, compared to 8% for those who file zero." This is a correlation. Useful for prediction. Useless for intervention planning.

Rung 2: Intervention (Doing) #

The do-calculus question. "If we set support quality to 'excellent' (do(support_quality = excellent)), what happens to churn?" This is not the same as observing that support quality and churn are correlated. It asks what happens when we intervene -- when we change the cause and observe the effect, controlling for confounders.

Rung 3: Counterfactual (Imagining) #

The most powerful and demanding level. "Customer #4721 churned last month. Would they have stayed if we had called them proactively at the 15-day mark?" This requires reasoning about a specific individual in a world that did not happen. It is the foundation of personalized intervention strategies.

Key Insight: Most "AI-driven decision support" tools operate at Rung 1. They find patterns. The Causal Agent operates at Rungs 2 and 3 -- it identifies interventions that will change outcomes and counterfactuals that explain individual cases.

Structural Causal Models (SCM) #

The scm (Structural Causal Model) declaration defines the causal graph -- the DAG (Directed Acyclic Graph) that encodes which variables cause which other variables.

NEAM

scm ChurnSCM {
    variables: [
        { name: "product_quality", type: "continuous", role: "exogenous" },
        { name: "support_quality", type: "continuous", role: "endogenous" },
        { name: "customer_satisfaction", type: "continuous", role: "endogenous" },
        { name: "login_frequency", type: "continuous", role: "endogenous" },
        { name: "spend_trend", type: "continuous", role: "endogenous" },
        { name: "support_tickets", type: "count", role: "endogenous" },
        { name: "churn", type: "binary", role: "outcome" }
    ],
    edges: [
        { from: "product_quality", to: "customer_satisfaction" },
        { from: "product_quality", to: "support_tickets" },
        { from: "support_quality", to: "customer_satisfaction" },
        { from: "support_quality", to: "support_tickets" },
        { from: "customer_satisfaction", to: "login_frequency" },
        { from: "customer_satisfaction", to: "spend_trend" },
        { from: "login_frequency", to: "churn" },
        { from: "spend_trend", to: "churn" }
    ],
    confounders: [
        { variable: "product_quality",
          affects: ["support_tickets", "customer_satisfaction"] },
        { variable: "market_conditions",
          affects: ["spend_trend", "churn"], observed: false },
        { variable: "competitor_actions",
          affects: ["churn", "login_frequency"], observed: false }
    ]
}

The causal graph for SimShop churn:

flowchart TD
  PQ["product_quality"] --> CS["customer_satisfaction"]
  PQ --> ST["support_tickets"]
  SQ["support_quality"] --> CS
  SQ --> ST
  CS --> LF["login_frequency"]
  CS --> SPT["spend_trend"]
  LF --> CHURN["CHURN"]
  SPT --> CHURN

  MC["market_conditions (unobserved)"] -.-> SPT
  MC -.-> CHURN
  CA["competitor_actions (unobserved)"] -.-> LF
  CA -.-> CHURN

Critical: The causal graph is a hypothesis, not a fact. It encodes domain expertise about which variables cause which. The Causal Agent can discover causal structure from data (using PC/FCI/GES algorithms), but domain expert review is always required before acting on the results.

Causal Discovery Algorithms #

When the causal graph is unknown, the Causal Agent can discover it from observational data using constraint-based and score-based algorithms.

NEAM

causal_discovery ChurnDiscovery {
    data: "ml_features.customer_360",
    algorithms: [
        {
            name: "PC",
            type: "constraint_based",
            alpha: 0.05,
            ci_test: "fisher_z"
        },
        {
            name: "FCI",
            type: "constraint_based",
            alpha: 0.05,
            handles_latent: true
        },
        {
            name: "GES",
            type: "score_based",
            score: "bic"
        },
        {
            name: "LLM_hybrid",
            type: "hybrid",
            model: "gpt-4o",
            prior_knowledge: ChurnSCM,
            data_driven: "GES",
            merge_strategy: "conservative"
        }
    ],
    consensus: {
        method: "majority_vote",
        min_agreement: 0.6,
        output: "consensus_graph"
    }
}

The LLM-hybrid approach is unique to Neam. It combines data-driven discovery (GES finds edges supported by data) with LLM-powered prior knowledge (GPT-4o suggests edges based on domain understanding). The conservative merge strategy only includes edges that both the data and the LLM agree on.

Intervention Analysis: The `do()` Operator #

The intervention declaration implements Pearl's do-calculus. It asks: "What would happen if we set a variable to a specific value?"

NEAM

intervention SupportIntervention {
    scm: ChurnSCM,
    treatment: {
        variable: "support_quality",
        do_value: "excellent",
        baseline_value: "current"
    },
    outcome: "churn",
    estimand: "ate",       // Average Treatment Effect
    methods: [
        {
            name: "backdoor_adjustment",
            covariates: ["product_quality", "customer_satisfaction"]
        },
        {
            name: "ipw",   // Inverse Probability Weighting
            propensity_model: "logistic_regression"
        },
        {
            name: "doubly_robust",
            outcome_model: "gradient_boosting",
            propensity_model: "logistic_regression"
        }
    ],
    sensitivity: {
        method: "e_value",
        confounding_strength: [1.5, 2.0, 3.0]
    }
}

Estimation Types:

Estimand	Question	Granularity
ATE (Average Treatment Effect)	"On average, how much does improving support reduce churn?"	Population
CATE (Conditional ATE)	"How much does support improvement reduce churn for enterprise customers?"	Subgroup
ITE (Individual Treatment Effect)	"How much would support improvement reduce churn for customer #4721?"	Individual

In the SimShop experiment, the ATE of improving support quality on churn was 0.15 -- meaning that improving support quality would reduce churn probability by 15 percentage points on average.

Counterfactual Analysis: Rung 3 #

NEAM

counterfactual ChurnCounterfactual {
    scm: ChurnSCM,
    question: "Would customer #4721 have stayed if support quality
               had been 'excellent' instead of 'poor'?",
    factual: {
        customer_id: 4721,
        support_quality: "poor",
        outcome: "churned"
    },
    counterfactual: {
        support_quality: "excellent"
    },
    method: "abduction_action_prediction",
    confidence: true
}

Counterfactual reasoning requires three steps:

Abduction -- Given what we observed (customer #4721 churned with poor support), infer the latent factors
Action -- In the counterfactual world, set support_quality to "excellent"
Prediction -- With the inferred latent factors and the intervened variable, predict the outcome

Bayesian Inference with PyMC #

For probabilistic causal models, the Causal Agent integrates with PyMC for Bayesian inference.

NEAM

bayesian_model ChurnBayesian {
    scm: ChurnSCM,
    framework: "pymc",
    priors: {
        support_effect: { distribution: "normal", mu: 0, sigma: 1 },
        product_effect: { distribution: "normal", mu: 0, sigma: 1 },
        baseline_churn: { distribution: "beta", alpha: 2, beta: 18 }
    },
    inference: {
        method: "nuts",           // No-U-Turn Sampler
        chains: 4,
        draws: 2000,
        tune: 1000
    },
    diagnostics: {
        rhat_threshold: 1.01,
        ess_threshold: 400,
        divergences_max: 0
    }
}

The Composable Primitive Pattern #

The Causal Agent is designed as a composable primitive -- a service that any other agent can invoke when it needs to understand why something happened, not just what happened.

flowchart LR
  DO["DataOps Agent"] -- "RCA: Why did pipeline fail?" --> CA["CAUSAL AGENT\n(composable primitive)"]
  DS["DataScientist Agent"] -- "Validation: Is this feature causal?" --> CA
  ML["MLOps Agent"] -- "Drift: Why did the model degrade?" --> CA

NEAM

// DataOps Agent invoking Causal for root cause analysis
let pipeline_rca = causal_analyze(ChurnCausal, {
    question: "Why did the feature pipeline fail at 3 AM?",
    data: pipeline_logs,
    scm: PipelineFailureSCM
})

// DataScientist Agent validating a feature
let feature_validity = causal_analyze(ChurnCausal, {
    question: "Is support_tickets_30d a cause of churn
               or merely correlated via product_quality?",
    data: feature_table,
    scm: ChurnSCM,
    test: "conditional_independence"
})

// MLOps Agent diagnosing drift
let drift_cause = causal_analyze(ChurnCausal, {
    question: "Why did model AUC drop from 0.847 to 0.79?",
    data: monitoring_logs,
    scm: DriftSCM,
    hypothesis: ["data_drift", "concept_drift", "schema_change"]
})

Design Principle: The Causal Agent is NOT a specialist that only the DataScientist uses. It is a general-purpose reasoning engine that any agent invokes when it needs to move beyond correlation to causation. This makes causal reasoning a cross-cutting concern, not a siloed capability.

The Complete Causal Agent Declaration #

NEAM

// ═══ BUDGET ═══
budget CausalBudget { cost: 50.00, tokens: 500000 }

// ═══ CAUSAL AGENT ═══
causal agent ChurnCausal {
    provider: "openai",
    model: "o3-mini",
    budget: CausalBudget
}

// ═══ FULL CAUSAL WORKFLOW ═══
// Step 1: Discover causal structure
let graph = causal_discover(ChurnCausal, ChurnDiscovery)

// Step 2: Estimate treatment effects
let ate_result = causal_intervene(ChurnCausal, SupportIntervention)
print("ATE of support improvement on churn: " + str(ate_result.ate))

// Step 3: Counterfactual reasoning
let cf_result = causal_counterfactual(ChurnCausal, ChurnCounterfactual)
print("Would customer #4721 have stayed? " + str(cf_result.outcome))

// Step 4: Generate business recommendations
let recommendations = causal_recommend(ChurnCausal, {
    interventions: [SupportIntervention],
    budget_constraint: 100000,
    time_horizon: "6_months"
})
print(recommendations)

Industry Perspective #

Causal inference is experiencing a renaissance in industry. Microsoft's DoWhy library, Uber's CausalML, and Google's CausalImpact have brought academic causal methods into production systems. The EU AI Act (2024) explicitly requires that high-risk AI systems provide causal explanations, not just correlations.

But adoption remains limited. A 2024 survey by Towards Data Science found that only 12% of data science teams regularly use causal inference methods. The primary barrier is complexity: causal inference requires domain expertise in graph specification, familiarity with econometric estimators, and careful sensitivity analysis.

The Neam Causal Agent lowers this barrier by encoding causal methods as declarative specifications. A data scientist who can write a problem_statement can also write an scm and an intervention. The agent handles the methodological complexity -- selecting appropriate estimators, running sensitivity analysis, checking identification assumptions -- while the human provides the domain knowledge encoded in the causal graph.

Pearl himself has argued (2019) that "the most important challenge facing causality research is software." The Neam Causal Agent is a step toward meeting that challenge.

Evidence: DataSims Experimental Proof #

Experiment: Ablation A2 -- System Without Causal Agent #

Setup: The full SimShop churn prediction workflow was run 5 times with the Causal Agent disabled (ablation no_causal). All other agents remained active.

Results:

Metric	Full System	Without Causal	Delta
Root Cause	support_quality_degradation	unknown	Lost
ATE	0.15	0	-100%
Causal Graph Edges	8	0	-100%
Confounders Identified	3	3	No change
Model AUC	0.847	0.847	No change
Quality Gate	passed	passed	No change

Analysis:

Without the Causal Agent, the system still builds an excellent predictive model (AUC=0.847). It still identifies the features that are associated with churn. But it cannot answer the VP's question: why are customers churning?

CODE

WITH CAUSAL AGENT:

  "Customers are churning because support quality has degraded.
   Improving support quality would reduce churn by 15 percentage
   points (ATE=0.15). The effect is strongest for enterprise
   customers in the technology segment (CATE=0.22)."

WITHOUT CAUSAL AGENT:

  "Customers who file many support tickets tend to churn.
   Root cause: unknown."

The difference is the difference between actionable intelligence and a correlation table.

Root cause degrades to "unknown" -- the system cannot identify support_quality_degradation as the causal mechanism because it has no causal reasoning capability
ATE drops to 0 -- without causal estimation, the system cannot quantify the expected impact of interventions
Causal graph edges drop to 0 -- no structural model is built, so no causal relationships are identified
Confounders are still reported (3) -- these come from the data profiling step (correlations), not from causal analysis. They are identified but not accounted for

Key Finding: The Causal Agent does not improve predictive accuracy. It provides explanatory depth. Without it, the system can predict who will churn but cannot explain why or recommend what to do about it. In the SimShop experiment, this meant the difference between a targeted "improve support quality for enterprise customers" recommendation and a generic "monitor churn" non-recommendation.

Reproducibility: 5/5 runs succeeded. Results are deterministic. Full data available at github.com/neam-lang/Data-Sims in evaluation/results/ablation_no_causal.json.

Key Takeaways #

Pearl's Ladder of Causation defines three levels: association (Rung 1), intervention (Rung 2), and counterfactual (Rung 3). Most ML systems operate at Rung 1 only
Structural Causal Models (SCM) encode domain expertise as directed acyclic graphs specifying which variables cause which
Causal discovery algorithms (PC, FCI, GES) can learn causal structure from data, but domain expert validation is always required
The LLM-hybrid discovery approach combines data-driven algorithms with LLM-powered domain knowledge
The intervention declaration implements Pearl's do-calculus with ATE, CATE, and ITE estimation
Counterfactual analysis enables "what if" reasoning about individual cases -- the foundation of personalized intervention strategies
The composable primitive pattern means ANY agent (DataOps, DataScientist, MLOps) can invoke the Causal Agent for root cause analysis
Bayesian inference via PyMC provides principled uncertainty quantification for causal estimates
DataSims ablation A2 proves: without the Causal Agent, root cause degrades to "unknown" and ATE drops to 0 -- the system can predict but cannot explain

Chapter 18 -- The Causal Agent: Understanding Why #

The Problem: Knowing What Without Knowing Why #

Pearl's Ladder of Causation #

Rung 1: Association (Seeing) #

Rung 2: Intervention (Doing) #

Rung 3: Counterfactual (Imagining) #

Structural Causal Models (SCM) #

Causal Discovery Algorithms #

Intervention Analysis: The do() Operator #

Counterfactual Analysis: Rung 3 #

Bayesian Inference with PyMC #

The Composable Primitive Pattern #

The Complete Causal Agent Declaration #

Industry Perspective #

Evidence: DataSims Experimental Proof #

Experiment: Ablation A2 -- System Without Causal Agent #

Key Takeaways #

Intervention Analysis: The `do()` Operator #