Chapter 10: The Migration Agent — Zero-Downtime Platform Moves #

"The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair." — Douglas Adams

Reading time: 20 min | Personas: Priya (Data Engineer), David (VP of Data) | Part III: Data Infrastructure Agents

What You'll Learn #

How the Migration Agent plans and executes cross-platform data migrations
How wave planning orders migration batches by dependency
How schema translation maps constructs across platforms (Oracle to Snowflake, SQL Server to BigQuery)
How data reconciliation validates correctness with row counts, checksums, and semantic checks
How zero-downtime cutover strategies keep production running during migration
How rollback planning and checkpoint/resume handle failures in long migrations
A complete example: migrating SimShop from PostgreSQL to Snowflake

The Problem: The Migration That Ate Q3 #

David is VP of Data at a growing e-commerce company. The board approved migrating from an on-premises Oracle data warehouse to Snowflake. The project was scoped for 8 weeks. It took 22.

The first 4 weeks went well: the team translated DDL, migrated lookup tables, and loaded historical data. Then things unraveled. The orders fact table had 800 million rows and a complex partitioning scheme that did not map to Snowflake's micro-partitions. Stored procedures used Oracle-specific syntax (CONNECT BY, MODEL, DECODE) that had no direct Snowflake equivalent. The team manually rewrote 47 stored procedures. Three had subtle bugs that were not caught until week 16, when finance noticed a $2.3M discrepancy in the monthly close.

The migration team had no automated reconciliation. They compared row counts manually in spreadsheets. They had no rollback plan beyond "restore from the Oracle backup." And the cutover window — originally planned as a 4-hour Saturday maintenance — stretched to 36 hours because nobody had mapped the dependency chain between the 15 ETL pipelines that fed downstream systems.

David's experience is not unusual. According to Bloor Research (2023), 83% of data migrations exceed their planned timeline, and 43% result in data quality incidents that affect business operations. The Migration Agent exists to make migrations predictable, validated, and reversible.

The Migration Agent #

The Migration Agent is a specialized agent that plans and executes cross-platform data migrations. It operates in the OPERATE phase of the agent lifecycle — it is activated when an organization needs to move data between platforms, whether that is Oracle to Snowflake, SQL Server to BigQuery, PostgreSQL to Databricks, or any other combination.

flowchart LR
  A["ASSESS\n\nInventory source\nDiscovery\nProfiling\nRisk scoring"] --> B["PLAN\n\nWave plan by deps\nBatch sizing\nRollback planning"]
  B --> C["TRANSLATE\n\nSchema DDL translation\nFunction mapping\nTest mapping in sandbox"]
  C --> D["MIGRATE\n\nExecute in\ndependency order\nCheckpoint/resume\nZero-downtime"]
  D --> E["RECONCILE\n\nRow counts\nChecksums\nSemantic validation\nSign-off"]

Declaration Syntax #

NEAM

migration agent <Name> {
    provider: "<provider>",
    model: "<model>",

    // Source and target platforms
    source_platform: <source_ref>,
    target_platform: <target_ref>,

    // Migration strategy
    strategy: "big_bang" | "phased" | "parallel_run",

    // Wave planning
    waves: { ... },

    // Reconciliation
    reconciliation: { ... },

    // Cutover
    cutover: { ... },

    // Rollback
    rollback: { ... },

    // Budget
    budget: <budget_ref>
}

Wave Planning: Dependency-Ordered Batches #

A wave plan breaks the migration into ordered batches, where each wave contains tables that can be migrated together because they have no unresolved dependencies on tables in later waves.

flowchart TD
  subgraph W1["Wave 1 (Foundation)"]
    W1A["dim_date\ndim_geography\ndim_channels\n\nNo dependencies"]
  end
  subgraph W2["Wave 2 (Core)"]
    W2A["dim_customers\ndim_products\ndim_campaigns\ndim_stores\n\nDepends on Wave 1"]
  end
  subgraph W3["Wave 3 (Analytics)"]
    W3A["fct_orders\nfct_activity\nfct_support\nfct_inventory\n\nDepends on Wave 2"]
  end
  subgraph W4["Wave 4 (ML/Features)"]
    W4A["churn_features\nrec_features\nltv_features\n\nDepends on Wave 3"]
  end
  subgraph W5["Wave 5 (Aggregates)"]
    W5A["agg_daily_revenue\nagg_monthly_seg\nagg_product_perf\n\nDepends on Wave 3"]
  end

  W1 -->|"Validate Wave 1\nbefore proceeding"| W2
  W2 -->|"Validate Wave 2\nbefore proceeding"| W3
  W3 -->|"Validate Wave 3\nbefore proceeding"| W4
  W3 -->|"Validate Wave 3\nbefore proceeding"| W5

The Migration Agent builds the wave plan automatically by analyzing foreign key relationships, view dependencies, and pipeline dependencies from the source catalog.

NEAM

migration agent SimShopMigration {
    provider: "openai", model: "gpt-4o",
    source_platform: PostgresSource,
    target_platform: SnowflakeTarget,

    strategy: "phased",

    waves: {
        auto_plan: true,
        max_tables_per_wave: 10,
        validation_gate: true,

        // Override: force specific tables into specific waves
        overrides: {
            wave_1: ["dim_date", "dim_geography", "dim_channels"],
            wave_5: ["agg_daily_revenue", "agg_monthly_segment"]
        }
    }
}

When auto_plan: true, the agent analyzes the source catalog and builds waves. You can override specific assignments with the overrides block. When validation_gate: true, each wave must pass reconciliation before the next wave begins.

Insight: Wave planning is not just about dependency ordering. It is also about risk management. By migrating reference data (dates, geographies) first, you validate the translation layer on small, simple tables before tackling high-volume fact tables. If something is wrong with the type mapping, you discover it on a 365-row date dimension, not an 800-million-row fact table.

Schema Translation #

Schema translation converts DDL, data types, functions, and platform-specific constructs from the source platform to the target platform.

Type Mapping #

PostgreSQL	Snowflake	BigQuery	Redshift
`SERIAL`	`NUMBER AUTOINCREMENT`	`INT64` (with sequence)	`INT IDENTITY`
`TEXT`	`VARCHAR`	`STRING`	`VARCHAR(MAX)`
`BOOLEAN`	`BOOLEAN`	`BOOL`	`BOOLEAN`
`TIMESTAMP`	`TIMESTAMP_NTZ`	`TIMESTAMP`	`TIMESTAMP`
`JSONB`	`VARIANT`	`JSON`	`SUPER`
`ARRAY`	`ARRAY`	`ARRAY`	Not supported (flatten)
`NUMERIC(p,s)`	`NUMBER(p,s)`	`NUMERIC(p,s)`	`DECIMAL(p,s)`
`INTERVAL`	Not native (compute)	Not native (compute)	Not native (compute)

Function Translation #

PostgreSQL	Snowflake	BigQuery
`NOW()`	`CURRENT_TIMESTAMP()`	`CURRENT_TIMESTAMP()`
`AGE(a, b)`	`DATEDIFF(day, b, a)`	`DATE_DIFF(a, b, DAY)`
`COALESCE(a, b)`	`COALESCE(a, b)`	`COALESCE(a, b)`
`STRING_AGG(col, ',')`	`LISTAGG(col, ',')`	`STRING_AGG(col, ',')`
`GENERATE_SERIES(1,10)`	`TABLE(GENERATOR(ROWCOUNT=>10))`	`GENERATE_ARRAY(1,10)`
`a::TEXT`	`TO_VARCHAR(a)`	`CAST(a AS STRING)`

The Migration Agent handles these translations automatically. For constructs with no direct equivalent (e.g., PostgreSQL LATERAL JOIN to platforms that lack it), the agent proposes a rewrite and flags it for human review.

Anti-Pattern: Assuming that all SQL is portable. Even "standard" SQL has platform-specific behavior around NULL handling, string collation, division-by-zero, and implicit type casting. The Migration Agent tests each translated query in a sandbox before migration, catching these subtle differences.

Data Reconciliation #

Reconciliation validates that the migrated data is correct. The Migration Agent performs three levels of validation:

Level 1: Row Count Validation #

Source Table	Target Table	Match?
customers: 100,247	customers: 100,247	YES
orders: 2,134,891	orders: 2,134,891	YES
events: 52,891,003	events: 52,891,002	NO (1 row missing)

Level 2: Checksum Validation #

Hash-based validation ensures that row content — not just row count — is identical.

Source: SHA256(customer_id || email || name || ...) = 0x7a3f...

Target: SHA256(customer_id || email || name || ...) = 0x7a3f...

Match: YES

Level 3: Semantic Validation #

Business-level queries that verify aggregated metrics match:

Source: SELECT SUM(amount) FROM orders WHERE year = 2025 → $48,291,003.47

Target: SELECT SUM(amount) FROM orders WHERE year = 2025 → $48,291,003.47

Match: YES (delta: $0.00)

NEAM

migration agent SimShopMigration {
    // ... (source, target, waves as above)

    reconciliation: {
        row_count: true,
        checksum: true,
        semantic: [
            {
                name: "revenue_2025",
                source_sql: "SELECT SUM(total_amount) FROM simshop_dw.fct_orders WHERE order_year = 2025",
                target_sql: "SELECT SUM(total_amount) FROM analytics.fct_orders WHERE order_year = 2025",
                tolerance: 0.001    // 0.1% tolerance for floating-point differences
            },
            {
                name: "customer_count",
                source_sql: "SELECT COUNT(DISTINCT customer_id) FROM simshop_oltp.customers",
                target_sql: "SELECT COUNT(DISTINCT customer_id) FROM raw.customers",
                tolerance: 0
            }
        ],
        on_mismatch: "block_and_alert"
    }
}

Zero-Downtime Cutover Strategies #

The cutover is the moment production traffic switches from the old platform to the new one. The Migration Agent supports three strategies:

Strategy 1: Blue-Green Cutover #

flowchart LR
  subgraph BEFORE["Before Cutover"]
    E1["ETL Pipelines"] -->|"active"| S1["Source (active)"]
    S1x["Target (standby)"]
  end
  subgraph AFTER["After Cutover"]
    E2["ETL Pipelines"] -->|"active"| T2["Target (active)"]
    T2x["Source (standby)"]
  end

  BEFORE -->|"Switch: update DNS /\nconnection string\nRollback: switch back"| AFTER

Strategy 2: Parallel Run #

Both platforms receive the same data for a validation period. Queries run against both, and results are compared.

flowchart LR
  E["ETL Pipelines\n(dual write)"] --> S["Source"]
  E --> T["Target"]
  S --> RA["Results A"]
  T --> RB["Results B"]
  RA --> C["Compare"]
  RB --> C

  C -.-|"Duration: 2-4 weeks\nCutover: after zero-mismatch\nvalidation period"| C

Strategy 3: Big Bang (Not Recommended) #

Everything moves at once during a maintenance window. Simple but risky. The Migration Agent supports it but flags it as high-risk.

NEAM

migration agent SimShopMigration {
    // ...

    cutover: {
        strategy: "parallel_run",
        validation_period: "14d",
        comparison_queries: [
            "SELECT SUM(total_amount), COUNT(*) FROM fct_orders WHERE order_date >= CURRENT_DATE - 7",
            "SELECT COUNT(DISTINCT customer_id) FROM dim_customers WHERE is_current = true"
        ],
        auto_cutover_on: "zero_mismatch_7_consecutive_days",
        rollback_window: "72h"
    }
}

Rollback Planning #

Every migration needs a rollback plan. The Migration Agent maintains rollback capability at every stage.

NEAM

migration agent SimShopMigration {
    // ...

    rollback: {
        strategy: "source_preserved",
        keep_source_active: true,
        rollback_window: "30d",
        checkpoint_interval: "per_wave",

        // Automated rollback triggers
        auto_rollback_on: [
            "reconciliation_failure",
            "query_performance_regression_gt_50pct",
            "error_rate_gt_1pct"
        ]
    }
}

Checkpoint/Resume for Long Migrations #

Large migrations can take days or weeks. The Migration Agent checkpoints progress after each wave, so that a failure in Wave 4 does not require re-migrating Waves 1-3.

flowchart LR
  W1["Wave 1"] --> CP1["Checkpoint"]
  CP1 --> W2["Wave 2"]
  W2 --> CP2["Checkpoint"]
  CP2 --> W3["Wave 3"]
  W3 -->|"FAILURE"| R["Resume from Wave 3\n(Waves 1-2 preserved)"]

Try It: In DataSims, the SimShop environment runs on PostgreSQL. Write a Migration Agent that targets Snowflake (or another platform of your choice). Use the 15 tables in simshop_dw as the migration scope. The agent should produce a 5-wave plan, translate DDL, and generate reconciliation queries.

Full Working Example: SimShop PostgreSQL to Snowflake #

NEAM

// ═══════════════════════════════════════════════════════════
// SimShop Migration — PostgreSQL to Snowflake
// ═══════════════════════════════════════════════════════════

// ── Budget ──
budget MigrationBudget { cost: 100.00, tokens: 500000 }

// ── Source Platform ──
compute PostgresSource {
    engine: "postgres",
    connection: env("SIMSHOP_PG_URL"),
    database: "simshop"
}

// ── Target Platform ──
compute SnowflakeTarget {
    engine: "snowflake",
    account: env("SNOWFLAKE_ACCOUNT"),
    warehouse: "MIGRATION_WH",
    database: "SIMSHOP_CLOUD",
    role: "MIGRATION_ROLE"
}

// ── Schema Contracts (validated on both sides) ──
schema MigCustomer {
    customer_id: string @primary_key,
    email: string @unique @not_null,
    first_name: string @not_null,
    last_name: string @not_null,
    country_code: string @length(2, 3),
    status: string @enum(["active", "inactive", "churned"]),
    lifetime_value: float @positive,
    signup_date: datetime @not_null
}

schema MigOrder {
    order_id: string @primary_key,
    customer_id: string @foreign_key(MigCustomer),
    order_date: datetime @not_null,
    total_amount: float @positive,
    channel: string @enum(["web", "mobile", "app"]),
    status: string @enum(["completed", "pending", "cancelled", "returned"])
}

// ── The Migration Agent ──
migration agent SimShopToSnowflake {
    provider: "openai",
    model: "gpt-4o",
    system: "You are a database migration specialist. Ensure zero data loss.",
    temperature: 0.1,

    source_platform: PostgresSource,
    target_platform: SnowflakeTarget,

    strategy: "phased",

    waves: {
        auto_plan: true,
        max_tables_per_wave: 10,
        validation_gate: true,
        overrides: {
            wave_1: ["dim_date", "dim_geography", "dim_channels"],
            wave_5: ["agg_daily_revenue", "agg_monthly_segment", "agg_product_performance"]
        }
    },

    reconciliation: {
        row_count: true,
        checksum: true,
        semantic: [
            {
                name: "total_revenue_2025",
                source_sql: "SELECT SUM(total_amount)::numeric(18,2) FROM simshop_dw.fct_orders WHERE EXTRACT(YEAR FROM order_date) = 2025",
                target_sql: "SELECT SUM(total_amount)::NUMBER(18,2) FROM analytics.fct_orders WHERE YEAR(order_date) = 2025",
                tolerance: 0.001
            },
            {
                name: "active_customer_count",
                source_sql: "SELECT COUNT(*) FROM simshop_dw.dim_customers WHERE is_current = true",
                target_sql: "SELECT COUNT(*) FROM analytics.dim_customers WHERE is_current = TRUE",
                tolerance: 0
            },
            {
                name: "order_count_match",
                source_sql: "SELECT COUNT(*) FROM simshop_dw.fct_orders",
                target_sql: "SELECT COUNT(*) FROM analytics.fct_orders",
                tolerance: 0
            }
        ],
        on_mismatch: "block_and_alert"
    },

    cutover: {
        strategy: "parallel_run",
        validation_period: "14d",
        auto_cutover_on: "zero_mismatch_7_consecutive_days",
        rollback_window: "72h"
    },

    rollback: {
        strategy: "source_preserved",
        keep_source_active: true,
        rollback_window: "30d",
        checkpoint_interval: "per_wave",
        auto_rollback_on: [
            "reconciliation_failure",
            "query_performance_regression_gt_50pct"
        ]
    },

    budget: MigrationBudget
}

// ── Execute Migration ──
let plan = migration_plan(SimShopToSnowflake)
print("Migration plan:")
print(plan)

let result = migration_execute(SimShopToSnowflake)
print("Migration result:")
print(result)

let reconciliation = migration_reconcile(SimShopToSnowflake)
print("Reconciliation:")
print(reconciliation)

Industry Perspective #

Data migration is one of the highest-risk activities in enterprise data management. Three industry realities shaped the Migration Agent's design:

The Cloud Migration Wave (2020-present): Gartner estimates that 75% of databases will be on cloud platforms by 2027. Most organizations are mid-migration, dealing with hybrid environments where some workloads run on-premises and others in the cloud. The Migration Agent's phased strategy with parallel-run validation is designed for this reality.

The Reconciliation Gap (persistent): A 2023 survey by Datacoral found that only 22% of organizations perform automated data reconciliation during migrations. The rest rely on manual spot-checks. The Migration Agent's three-level reconciliation (row count, checksum, semantic) closes this gap.

The Rollback Myth (persistent): Most migration plans include "rollback" as a line item but never test it. The Migration Agent's checkpoint/resume mechanism with automated rollback triggers turns rollback from a theoretical option into a tested capability.

The Evidence #

The DataSims environment is built on PostgreSQL, making it an ideal source platform for testing the Migration Agent. The 15 tables in simshop_dw (5 fact tables, 6 dimensions, 4 aggregates) provide a realistic warehouse migration scope:

Wave planning correctness: The agent should produce 5 waves with the correct dependency ordering (reference dimensions first, fact tables second, aggregates last).
Schema translation accuracy: PostgreSQL-specific types (SERIAL, TEXT, TIMESTAMP, JSONB) must translate correctly to the target platform.
Reconciliation precision: All three levels (row count, checksum, semantic) must produce zero mismatches after migration.
Cutover safety: The parallel-run validation must catch any discrepancy introduced by type translation differences (e.g., floating-point precision across platforms).

For the full environment setup, see DataSims. For the language reference, see Neam: The AI-Native Programming Language.

Key Takeaways #

Wave planning breaks migrations into dependency-ordered batches, validating each wave before proceeding to the next.
Schema translation maps data types, functions, and platform-specific constructs automatically, flagging ambiguous translations for human review.
Three-level reconciliation (row count, checksum, semantic) validates correctness at every stage, not just at the end.
Zero-downtime cutover via parallel-run or blue-green strategies keeps production running during migration.
Checkpoint/resume ensures that a failure in Wave 4 does not require re-migrating Waves 1-3.
Automated rollback triggers turn rollback from a theoretical option into a tested, instant capability.