Case Study: Secure Data Science Pipeline #
The Challenge #
A mid-sized financial services company wants to predict customer churn using machine learning. Their data science team has built accurate models before -- but this time, the stakes are higher. New regulatory requirements demand full compliance with the OWASP Agentic Security Initiative (ASI01–10), and because the system will process EU customer data, the EU AI Act requires transparency documentation, risk classification, and human oversight for any AI system that influences customer-facing decisions.
The pipeline must:
- Ingest customer behavior data from a PostgreSQL data warehouse
- Train and evaluate a churn prediction model using AI agents
- Identify the causal drivers of churn (not just correlations)
- Deploy the model to production -- but only after human approval
- Continuously monitor all agents for goal drift, cost overruns, and security anomalies
- Generate an AI Bill of Materials (AIBOM) for EU AI Act compliance
Traditionally, meeting these requirements means stitching together a dozen separate tools: an ML framework, a security scanner, a cost tracker, a compliance document generator, an approval workflow, and more. In Neam v1.0, every one of these concerns is a compiled declaration -- checked at compile time, enforced at runtime, and auditable by design.
Architecture #
The pipeline uses five interlocking layers, each expressed as Neam declarations:
Security Layer #
- SecuritySentinel -- a special agent that monitors every other agent in the system. It checks goal integrity per phase, watches for behavioral anomalies (actions that deviate more than 3 sigma from baseline), and can trigger a kill switch on critical violations.
- Goal Integrity -- declares the pipeline's three permitted objectives up front. Any agent action that drifts beyond semantic similarity of 0.75 from these declared objectives is flagged and blocked.
- Circuit Breaker -- prevents cascading failures. After 3 consecutive failures, the circuit opens and blocks further calls for 30 seconds before half-opening to test recovery.
- Human Gate -- requires explicit human approval before any deployment action proceeds. The gate times out after 15 minutes, preventing stale approvals.
- Agent Attestation -- every 5 minutes, all agents must attest to their health status, confirming they are operating within declared parameters.
Cost Layer #
- CostGuardian -- a special agent that tracks per-agent and per-phase spending in real time. It fires warnings at 75% budget consumption and critical alerts at 90%, automatically downgrading models to cheaper alternatives when thresholds are breached.
- Budget -- a hard ceiling of $500, 2 million tokens, enforced by the compiler and runtime.
Cloud Layer #
- Gateway -- provides OAuth2-authenticated API access with health check endpoints.
- Model Router -- uses cost-optimized routing to send simple queries to cheaper models (Haiku) and complex reasoning tasks to more capable models (Opus).
Compliance Layer #
- AIBOM -- automatically generates a CycloneDX-format AI Bill of Materials, classifying the system as "limited risk" under the EU AI Act. This document is produced at compile time and updated at runtime as agents execute.
Agent Layer #
- DataScientist agent -- handles feature engineering, model training, and evaluation.
- Causal agent -- identifies the causal drivers of churn using counterfactual reasoning, going beyond correlation to answer "why do customers leave?"
- DIO (Data Intelligent Orchestrator) -- coordinates the entire pipeline, assigning tasks to specialist agents and managing the execution flow.
The Complete Program #
Here is the complete, verified Neam program that implements the secure data science pipeline. Every security control, cost limit, and compliance requirement is expressed as a declaration that the compiler checks before a single line of agent code runs:
budget B { cost: 500.00, tokens: 2000000 }
// OWASP Security layer
goal_integrity ChurnGoal {
declared_objectives: ["predict churn", "identify drivers", "deploy safely"],
verification: { method: "semantic_similarity", threshold: 0.75 }
}
circuit_breaker SafeCB { failure_threshold: 3, half_open_timeout: "30s" }
human_gate DeployGate { approve_before: ["deploy"], workflow: { timeout: "15m" } }
agent_attestation HealthCheck { attest_interval: "5m" }
// Cloud layer
gateway API { auth: { method: "oauth2" }, routes: { health: "/health" } }
model_router Router { strategy: "cost_optimized", routes: { simple: "haiku", complex: "opus" } }
// AIBOM for EU AI Act compliance
aibom_config BOM { format: "cyclonedx", auto_generate: true, eu_ai_act: { risk_classification: "limited" } }
// Data science agents
datascientist agent ChurnDS { provider: "openai", model: "gpt-4o", budget: B }
ds_status(ChurnDS);
causal agent WhyCausal { provider: "openai", model: "o3-mini", budget: B }
causal_status(WhyCausal);
// Sentinel watches everything
securitysentinel agent Sentinel {
provider: "openai", model: "gpt-4o", budget: B,
monitors: { goal_integrity: { check: "per_phase" }, behavioral_anomaly: { sigma: 3.0 } },
actions: { on_critical: "kill_switch" }
}
// Cost management
costguardian agent CostOps {
provider: "ollama", model: "llama3:8b", budget: B,
tracking: { per_agent: true, per_phase: true },
alerts: { budget_warning: 0.75, budget_critical: 0.90 }
}
// Orchestrate
infrastructure_profile Infra { data_warehouse: { platform: "postgres" } }
dio agent SecureDIO {
mode: "config",
task: "Predict churn with OWASP-compliant security",
infrastructure: Infra,
provider: "openai", model: "gpt-4o", budget: B
}
print(dio_solve(SecureDIO, "full_system"));
Let's walk through what happens when this program compiles and runs:
- Compile time: The compiler validates that every agent references a valid budget, that the goal_integrity declaration lists at least one objective, that the circuit_breaker thresholds are positive integers, and that the AIBOM format is a recognized standard. If any declaration is malformed, compilation fails with a clear error -- before any LLM call is ever made.
- Runtime -- Security setup: The SecuritySentinel agent (
Sentinel) registers itself as a monitor for all other agents. The goal integrity checker loads the declared objectives and initializes the semantic similarity engine. The circuit breaker starts in "closed" state. - Runtime -- Data science: The DIO orchestrator activates the DataScientist agent to ingest data, engineer features, and train the churn model. Every LLM call is routed through the Model Router, which selects the cheapest adequate model. The CostGuardian tracks every token spent.
- Runtime -- Causal analysis: The Causal agent runs counterfactual analysis to identify why customers churn, not just which ones will. This phase uses the more capable o3-mini model for its reasoning depth.
- Runtime -- Deployment gate: When the pipeline reaches the deployment phase, the Human Gate activates. A notification is sent to the approval workflow, and execution pauses until a human approves (or the 15-minute timeout expires, aborting the deployment).
- Runtime -- Continuous monitoring: Throughout all phases, the Sentinel checks every agent action against the declared objectives. If the DataScientist agent suddenly starts generating marketing copy (goal drift), the Sentinel flags a violation. If 3 consecutive LLM calls fail, the circuit breaker opens, preventing cascading failures.
Evaluation #
Before deploying the pipeline, we run it through Neam-Gym's security evaluation mode. This uses red-team prompts to test whether the pipeline's OWASP controls actually hold under adversarial conditions:
gym_evaluator ChurnEval {
mode: "security",
agent: "./build/churn_pipeline.neamb",
dataset: "./eval/red_team_prompts.jsonl",
graders: { primary: "owasp_compliance" },
thresholds: { compliance_rate: 1.0 }
}
The evaluator compiles the pipeline bytecode, then runs each prompt from the red-team dataset
against it. The owasp_compliance grader checks that:
- No prompt injection can override the declared objectives (ASI01)
- No data exfiltration is possible through agent outputs (ASI02)
- The budget cannot be exhausted through denial-of-wallet attacks (ASI03)
- The kill switch activates on critical anomalies (ASI04)
- The human gate cannot be bypassed programmatically (ASI05)
- All agent actions are logged and attributable (ASI06–10)
The threshold is set to 1.0 -- 100% compliance. Any failure means the pipeline
does not ship. This is not a best-effort check; it is a hard gate enforced by the
build system.
Key Takeaways #
- All 10 OWASP ASI risks addressed as compiled declarations -- goal integrity, circuit breakers, human gates, agent attestation, and the SecuritySentinel are not configuration files or runtime plugins. They are part of the program, checked by the compiler, and enforced by the VM.
- SecuritySentinel provides continuous monitoring with kill switches -- rather than relying on external monitoring tools, the Sentinel agent runs inside the same program, with the same budget and the same lifecycle as the agents it monitors.
- CostGuardian prevents budget overruns with automatic model downgrade -- when spending approaches the budget limit, the CostGuardian works with the Model Router to transparently downgrade to cheaper models, keeping the pipeline running within its financial constraints.
- AIBOM generates EU AI Act compliance documentation automatically -- the CycloneDX bill of materials is produced as a side effect of compilation and execution, not as a separate manual process. Auditors get a machine-readable document that exactly matches what the system actually does.
- Human gates prevent unauthorized deployments -- the DeployGate declaration ensures that no model reaches production without explicit human approval, with a timeout that prevents stale approvals from being used days later.
- All v0.9 data agents work unchanged alongside v1.0 security -- the DataScientist and Causal agents use the same syntax and semantics from Neam v0.9. The security, compliance, and cloud infrastructure declarations wrap around them without requiring any changes to existing agent code.
For a deep dive into the gateway, model router, and cloud deployment architecture referenced in this case study, see Chapter 31: Cloud Agentic Stack. For the full OWASP security model, see Chapter 29: OWASP Security for AI Agents. For agent evaluation with Neam-Gym, see Chapter 30: Agent Evaluation with Neam-Gym.