Programming Neam
📖 21 min read

Chapter 25: Forge Agents -- Iterative Build Agents #

"The smith does not carry yesterday's heat into today's strike. Each blow begins fresh, guided only by the shape of what remains." -- Workshop proverb

In Chapter 24, you learned about claw agents -- persistent session agents that maintain conversation history across turns. Claw agents keep themselves alive, accumulating context as a conversation unfolds. This chapter introduces their counterpart: forge agents. Where a claw agent says "keep the agent alive, let context grow," a forge agent says "keep the world alive, move the agent through it." Each iteration starts with a clean context window. The filesystem is the memory. The plan file is the roadmap. The verify callback is the quality gate. The result is an agent architecture purpose-built for long-horizon build tasks -- TDD workflows, code generation pipelines, document assembly, and any problem where accumulated context would eventually cause drift.

By the end of this chapter, you will be able to:

💠 Why This Matters

Long-running AI tasks fail in predictable ways. After dozens of tool calls, the agent's context window fills with stale information. It forgets early instructions. It repeats mistakes. It drifts from the plan. Human developers solve this problem naturally: they close the editor, take a break, reopen the project, read the code on disk, and pick up where they left off. The forge agent encodes this pattern into a language construct. Each iteration is a fresh developer sitting down at the workstation, reading the project files, doing one task, verifying the result, and committing. The world (filesystem, git history, plan file) carries all the state. The agent carries none.


25.1 The Forge Agent Philosophy #

A forge agent operates on a fundamentally different principle from the agents you have used so far.

Keep the World Alive, Move the Agent Through It #

In a standard agent or claw agent, the agent is the center of gravity. It accumulates messages, tool results, and conversation history in its context window. The longer it runs, the more context it carries. This works well for short interactions, but for long-horizon tasks -- building a project file by file, refactoring a codebase, running a multi-step test-driven development cycle -- the accumulated context becomes a liability.

A forge agent inverts this relationship. The world (filesystem, git repository, plan files, progress logs) is the persistent state. The agent is ephemeral. Each iteration creates a fresh agent with a clean context window, loads only the information it needs from the world, performs one task, verifies the result, and exits. The next iteration starts clean again.

CLAW AGENT (accumulated context)
─────────────────────────────────
Iteration 1: [system + task1 + result1]
Iteration 2: [system + task1 + result1 + task2 + result2]
Iteration 3: [system + task1 + result1 + task2 + result2 + task3...]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Context grows → drift risk increases
FORGE AGENT (fresh context)
───────────────────────────
Iteration 1: [system + plan + current_task] → verify → checkpoint
Iteration 2: [system + plan + current_task] → verify → checkpoint
Iteration 3: [system + plan + current_task] → verify → checkpoint
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Context is constant → no drift

Fresh Context Model #

Each forge iteration starts with a blank slate. The VM constructs a new message array containing only:

  1. The system prompt
  2. The contents of the prompt file (loaded from disk)
  3. The current task from the plan file
  4. A summary of completed tasks from the progress file
  5. Any learnings from previous iterations

This means the context window size is bounded and predictable. Whether the agent is on iteration 1 or iteration 25, it receives approximately the same amount of context. No stale tool results. No forgotten instructions buried under pages of conversation history.

Drift Prevention by Design #

Context drift is the gradual degradation of agent behavior as accumulated messages push the system prompt further from the attention window. In transformer-based models, information at the beginning and end of the context receives the most attention (the "primacy" and "recency" effects). As a claw agent accumulates history, the system prompt -- which defines the agent's personality and constraints -- gets pushed into a low-attention zone.

Forge agents eliminate this problem structurally. The system prompt is always at the top of a fresh, short context. There is no history to push it away. The agent's behavior on iteration 25 is as faithful to the system prompt as on iteration 1.

Comparison with the Ralph Loop Pattern #

Before forge agents existed as a language construct, developers built iterative loops manually using what was known as the "Ralph Loop" pattern -- an external script that:

  1. Reads a plan file
  2. Calls an LLM with the current task
  3. Checks if the task passed verification
  4. Updates a progress file
  5. Commits to git
  6. Repeats

This pattern worked, but it required boilerplate orchestration code outside the Neam program. The forge agent keyword turns this external pattern into a first-class language construct with compile-time safety, built-in budget tracking, and standardized file formats.

🌎 Real-World Analogy

Think of a forge agent like a relay race. Each runner (iteration) sprints one leg of the race, then hands the baton (the filesystem state) to the next runner. No single runner needs to remember the entire race -- they only need to know which leg they are running and where the baton is. The track (filesystem) and the scoreboard (progress file) carry the full history.


25.2 Forge Agent Declaration #

A forge agent is declared using the forge agent keyword followed by a name and a configuration block. The syntax is similar to a standard agent declaration but with additional fields specific to iterative build workflows.

Syntax Overview #

neam
forge agent <Name> {
  // Required fields
  provider: <string>
  model: <string>
  verify: <function_reference>

  // Optional fields
  system: <string>
  temperature: <float>
  skills: [<skill_list>]
  guards: [<guard_chain_list>]
  workspace: <string>

  // Loop configuration
  loop {
    max_iterations: <int>
    max_cost: <float>
    max_tokens: <int>
    prompt_file: <string>
    plan_file: <string>
    progress_file: <string>
    learnings_file: <string>
  }

  // Checkpoint strategy
  checkpoint: <string>
}

Required Fields #

Field Type Description
provider string LLM provider ("ollama", "openai", "anthropic", "gemini")
model string Model identifier (e.g., "gpt-4o", "claude-sonnet-4")
verify function Reference to a verify callback function

The verify field is what makes a forge agent a forge agent. Without external verification, the agent would have no way to know whether its work succeeded. You will learn the full verify callback protocol in Section 25.4.

Optional Fields #

Field Type Default Description
system string "" System prompt defining agent behavior
temperature float 0.7 Sampling temperature (0.0 to 2.0)
skills list [] Tools available to the agent
guards list [] Guard chains for safety
workspace string "." Working directory for file operations
loop block defaults Loop configuration (see Section 25.6)
checkpoint string "none" Checkpoint strategy: "git", "snapshot", "none"

Minimal Forge Agent #

Here is the simplest possible forge agent. It uses all defaults for the loop configuration and has no checkpoint strategy:

neam
fun check_output(ctx) {
  let file = file_read("output.txt");
  if (file != nil) {
    return VerifyResult.Done("Output file created successfully.");
  }
  return VerifyResult.Retry("output.txt does not exist yet. Please create it.");
}

forge agent Builder {
  provider: "openai"
  model: "gpt-4o"
  verify: check_output
  system: "You are a code builder. Complete the current task."
  skills: [write_file, read_file]
}

{
  let outcome = Builder.run();
  emit f"Result: {outcome.message}";
}

This is 15 lines of agent declaration (including the verify function). The forge agent will iterate up to 25 times (the default), calling the LLM with the system prompt, then checking check_output after each iteration.

Here is a forge agent that uses every available configuration option:

neam
fun verify_tests(ctx) {
  let result = exec("npm test 2>&1");
  if (result.exit_code == 0) {
    return VerifyResult.Done(f"Task {ctx.current_task} passed all tests.");
  }
  if (ctx.iteration > 3) {
    return VerifyResult.Abort("Failed after 3 attempts. Manual review needed.");
  }
  return VerifyResult.Retry(f"Tests failed:\n{result.stdout}\nPlease fix the errors.");
}

forge agent TDDBuilder {
  provider: "anthropic"
  model: "claude-sonnet-4"
  verify: verify_tests
  system: """
    You are a senior software engineer practicing test-driven development.
    Read the current task from the plan. Write code that makes the tests pass.
    Do not modify test files. Write clean, documented code.
  """
  temperature: 0.3
  skills: [workspace_write, workspace_read, exec_command]
  guards: [SecurityChain]
  workspace: "./project"

  loop {
    max_iterations: 30
    max_cost: 15.0
    max_tokens: 750000
    prompt_file: "prompt.md"
    plan_file: "plan.txt"
    progress_file: "progress.jsonl"
    learnings_file: "learnings.jsonl"
  }

  checkpoint: "git"
}

This declaration is approximately 40 lines and covers:

💡 Tip

Start with a minimal forge agent and add configuration as you need it. The defaults are sensible for most use cases. You can always add a loop block or change the checkpoint strategy later.


25.3 The .run() Pipeline #

When you call .run() on a forge agent, the VM executes a multi-stage pipeline. This section walks through every step in detail.

Pipeline Overview #

FORGE AGENT .run() PIPELINE
SETUP
1. Run security checks (guards, policy)
2. Create LLM provider connection
3. Load plan_file → task list
4. Load progress_file → completed tasks
5. Load prompt_file → base prompt
6. Load learnings_file → accumulated learnings
d. VERIFY: call verify_fn(ctx)
Done(msg) → checkpoint → mark task done → next task
Retry(fb) → feed back message → next iteration
Abort(rsn) → stop loop immediately
All tasks done? → return Completed
RETURN LoopOutcome
Completed | MaxIterations | Aborted | BudgetExhausted

Stage 1: Setup #

Before the main loop begins, the VM performs initialization:

  1. Security checks. If the forge agent has guards or a policy, the VM validates them at startup. Any compile-time policy violation halts execution immediately.

  2. Provider creation. The VM establishes a connection to the LLM provider, resolving the API key from the environment and validating the model identifier.

  3. Plan file loading. If plan_file is configured, the VM reads it and parses each line as a separate task. Empty lines and lines starting with # are ignored.

  4. Progress file loading. If progress_file is configured, the VM reads the JSONL file and identifies which tasks have already been completed. This enables resumability -- if the program crashes and restarts, it picks up from the last completed task.

  5. Prompt file loading. If prompt_file is configured, the VM reads it as a Markdown or plain text file. Its contents become part of every iteration's context.

  6. Learnings file loading. If learnings_file is configured, the VM reads accumulated learnings from previous runs and includes them in the context.

Stage 2: Main Loop #

The main loop runs from iteration = 1 to max_iterations. Each iteration follows four sub-stages:

a. Budget check. The VM compares total_cost against max_cost and total_tokens against max_tokens. If either limit is exceeded, the loop exits immediately with a BudgetExhausted outcome.

b. Build messages. This is where the fresh context model is implemented. The VM constructs a new message array from scratch:

text
messages = [
  { role: "system",    content: <system_prompt> },
  { role: "user",      content: <prompt_file_contents> },
  { role: "user",      content: "Current task: <task_description>" },
  { role: "user",      content: "Completed tasks: <progress_summary>" },
  { role: "user",      content: "Learnings: <learnings_summary>" },
  { role: "assistant", content: <feedback_from_previous_verify> }  // if retry
]

Notice that this array is built fresh every iteration. No conversation history from previous iterations leaks in.

c. LLM + tool loop. The VM sends the messages to the LLM. If the LLM responds with a tool call, the VM executes the tool, appends the result to the messages, and sends the updated messages back to the LLM. This inner loop can repeat up to 25 times per iteration, allowing the agent to chain multiple tool calls (read a file, write a file, run a command) within a single iteration.

d. Verify. After the LLM produces a final text response (no more tool calls), the VM calls the verify callback function. The callback receives a context object and returns a VerifyResult value that determines what happens next.

Stage 3: Return #

When the loop exits -- whether by completing all tasks, hitting the iteration limit, being aborted, or exhausting the budget -- the VM returns a LoopOutcome value to the calling code.

Code Example #

neam
fun verify_build(ctx) {
  let result = exec("cargo build 2>&1");
  if (result.exit_code == 0) {
    return VerifyResult.Done("Build succeeded.");
  }
  return VerifyResult.Retry(f"Build failed:\n{result.stdout}");
}

forge agent RustBuilder {
  provider: "openai"
  model: "gpt-4o"
  verify: verify_build
  system: "You are a Rust developer. Implement the current task."
  skills: [workspace_write, workspace_read, exec_command]

  loop {
    max_iterations: 20
    plan_file: "plan.txt"
    progress_file: "progress.jsonl"
  }

  checkpoint: "git"
}

{
  let outcome = RustBuilder.run();

  match outcome {
    Completed(msg) => emit f"All tasks completed: {msg}",
    MaxIterations   => emit "Reached iteration limit. Some tasks remain.",
    Aborted(reason) => emit f"Aborted: {reason}",
    BudgetExhausted => emit "Budget exhausted before completion."
  }
}
📝 Note

The .run() method takes no arguments. All configuration comes from the forge agent declaration. The prompt, plan, and context are loaded from files specified in the loop block. This design ensures that the same forge agent can be run repeatedly with different plan files by updating the files on disk before calling .run().


25.4 The Verify Callback #

The verify callback is the quality gate that separates forge agents from simple loops. After each iteration, the VM calls the verify function to determine whether the agent's work meets the acceptance criteria.

Function Signature #

neam
fun verify_name(ctx) {
  // ctx.iteration      → int: current iteration number (1-based)
  // ctx.current_task   → string: the current task description
  // ctx.feedback       → string | nil: feedback from previous verify (if retry)
  // ctx.total_cost     → float: cumulative cost in USD so far
  // ctx.total_tokens   → int: cumulative token usage so far
  // ctx.workspace      → string: path to the workspace directory
  //
  // Must return a VerifyResult value
}

The context object (ctx) provides five fields that give the verify function full visibility into the loop's state. This allows sophisticated verification logic that considers cost, iteration count, and previous feedback.

VerifyResult Sealed Type #

The verify function must return a value of the VerifyResult sealed type:

neam
sealed VerifyResult {
  Done(message: string),
  Retry(feedback: string),
  Abort(reason: string)
}

Each variant triggers a different behavior in the main loop:

Return Value Effect
VerifyResult.Done(message) Checkpoint the current state. Mark the current task as completed in the progress file. Advance to the next task. If no tasks remain, exit the loop with Completed.
VerifyResult.Retry(feedback) Feed the feedback string back to the agent in the next iteration as additional context. The agent will see what went wrong and can attempt to fix it.
VerifyResult.Abort(reason) Stop the loop immediately. Return an Aborted outcome with the given reason. Use this for unrecoverable failures.

Simple Verify Example: File Existence Check #

The simplest verify function checks whether a file exists:

neam
fun check_file_exists(ctx) {
  let content = file_read("output.txt");
  if (content != nil and len(content) > 0) {
    return VerifyResult.Done("File output.txt created with content.");
  }
  return VerifyResult.Retry("File output.txt is missing or empty. Please create it.");
}

This is useful for tasks where the acceptance criterion is simply "produce a file."

Complex Verify Example: Run Tests and Check Coverage #

A production verify function often runs external commands and parses their output:

neam
fun verify_tdd(ctx) {
  // Run the test suite
  let test_result = exec("npm test 2>&1");

  if (test_result.exit_code != 0) {
    // Tests failed -- provide the error output as feedback
    let feedback = f"Tests failed on iteration {ctx.iteration}.\n";
    feedback = feedback + f"Error output:\n{test_result.stdout}\n";
    feedback = feedback + "Please read the error messages and fix the code.";

    // After 5 failed attempts on the same task, abort
    if (ctx.iteration > 5) {
      return VerifyResult.Abort(
        f"Task '{ctx.current_task}' failed after 5 attempts. " +
        "Manual intervention required."
      );
    }

    return VerifyResult.Retry(feedback);
  }

  // Tests passed -- check coverage
  let cov_result = exec("npm run coverage -- --json 2>&1");
  if (cov_result.exit_code == 0) {
    let cov_data = json_parse(cov_result.stdout);
    let coverage_pct = cov_data["total"]["lines"]["pct"];
    if (coverage_pct < 80.0) {
      return VerifyResult.Retry(
        f"Tests pass but coverage is {coverage_pct}%. Need at least 80%."
      );
    }
  }

  return VerifyResult.Done(
    f"Task '{ctx.current_task}' verified: tests pass, coverage adequate."
  );
}

This verify function:

  1. Runs the test suite and checks for failures
  2. If tests fail, provides detailed error output as feedback
  3. If too many failures accumulate, aborts the entire loop
  4. If tests pass, checks code coverage
  5. If coverage is below threshold, retries with a coverage-specific message
  6. Only returns Done when both tests and coverage pass
Common Mistake: Verifying Inside the Agent Prompt

Verifying Inside the Agent Prompt

A frequent error is to include verification instructions in the system prompt: "After writing the code, run the tests yourself and make sure they pass." This is unreliable for two reasons. First, the LLM may hallucinate test results instead of actually running tests. Second, the agent's assessment of its own work is biased -- it generated the code, so it is inclined to believe it works. External verification through the verify callback is objective and deterministic. The verify function runs real commands, checks real files, and returns real results. Never delegate verification to the agent itself.


25.5 LoopOutcome #

When the .run() pipeline exits, it returns a LoopOutcome value. This is a sealed type that encodes exactly how and why the loop terminated.

Sealed Type Definition #

neam
sealed LoopOutcome {
  Completed(message: string),
  MaxIterations,
  Aborted(reason: string),
  BudgetExhausted
}

Outcome Details #

Variant When It Occurs Fields
Completed(message) All tasks in the plan file were verified successfully message: final status message
MaxIterations The loop reached max_iterations without completing all tasks (none)
Aborted(reason) The verify callback returned VerifyResult.Abort(reason) reason: why the loop was aborted
BudgetExhausted total_cost >= max_cost or total_tokens >= max_tokens (none)

Return Value Properties #

In addition to the variant, the LoopOutcome value carries metadata accessible through properties:

Property Type Description
outcome.iterations int Total number of iterations executed
outcome.total_cost float Total cost in USD across all iterations
outcome.total_tokens int Total tokens consumed across all iterations
outcome.message string Human-readable summary of the outcome
outcome.tasks_completed int Number of tasks that were verified
outcome.tasks_total int Total number of tasks in the plan

Code Example: Checking the Outcome #

neam
{
  let outcome = MyForgeAgent.run();

  match outcome {
    Completed(msg) => {
      emit "All tasks completed successfully.";
      emit f"Message: {msg}";
      emit f"Iterations used: {outcome.iterations}";
      emit f"Total cost: ${outcome.total_cost}";
    },
    MaxIterations => {
      emit f"Reached {outcome.iterations} iterations.";
      emit f"Tasks completed: {outcome.tasks_completed}/{outcome.tasks_total}";
      emit "Consider increasing max_iterations or simplifying tasks.";
    },
    Aborted(reason) => {
      emit f"Loop aborted: {reason}";
      emit f"Completed {outcome.tasks_completed} of {outcome.tasks_total} tasks.";
      emit "Review the progress file for details.";
    },
    BudgetExhausted => {
      emit "Budget exhausted.";
      emit f"Spent: ${outcome.total_cost} / tokens: {outcome.total_tokens}";
      emit f"Tasks completed: {outcome.tasks_completed}/{outcome.tasks_total}";
    }
  }
}

The match expression ensures you handle every possible outcome. If you add logging, alerting, or retry logic, you can do so per-variant. For instance, you might automatically retry with a higher budget when BudgetExhausted occurs, or send a notification to a Slack channel when Aborted fires.

💡 Tip

Always log outcome.total_cost in production. Forge agents that run many iterations against powerful models can accumulate significant costs. Tracking cost per run helps you identify inefficient verify functions or overly broad tasks that cause excessive retries.


25.6 Loop Configuration #

The loop block inside a forge agent declaration controls how the main loop behaves. Every field has a sensible default, so you only need to specify the ones you want to override.

Numeric Limits #

Field Type Default Description
max_iterations int 25 Maximum number of iterations before the loop exits with MaxIterations
max_cost float 10.0 Maximum cumulative cost in USD before BudgetExhausted
max_tokens int 500000 Maximum cumulative tokens before BudgetExhausted

File Configuration #

Field Type Default Description
prompt_file string nil Path to a Markdown or text file with the base prompt
plan_file string nil Path to a text file with one task per line
progress_file string nil Path to a JSONL file tracking completed tasks
learnings_file string nil Path to a JSONL file storing cross-iteration learnings

All file paths are relative to the workspace directory.

Plan File Format #

The plan file is a plain text file with one task per line. Empty lines and lines starting with # are ignored:

text
# plan.txt -- TDD Code Builder Plan
Implement the User model with name, email, and password_hash fields
Write the UserRepository with create, find_by_id, and find_by_email methods
Implement the AuthService with register and login methods
Add input validation to the registration endpoint
Write integration tests for the full auth flow

Each line becomes a separate task. The forge agent processes tasks in order, advancing to the next task only when the verify callback returns Done for the current one.

💡 Tip

Keep plan tasks small and focused. A task like "Build the entire authentication system" is too broad -- the agent may struggle, and verification becomes ambiguous. Break it into discrete, verifiable steps. Each task should have a clear pass/fail criterion that your verify function can check.

Progress File Format (JSONL) #

The progress file uses JSON Lines format. Each line is a JSON object recording the completion of one task:

jsonl
{"task": "Implement the User model with name, email, and password_hash fields", "status": "done", "iteration": 2, "timestamp": "2026-02-18T10:23:45Z", "cost": 0.42, "tokens": 12840}
{"task": "Write the UserRepository with create, find_by_id, and find_by_email methods", "status": "done", "iteration": 5, "timestamp": "2026-02-18T10:31:12Z", "cost": 0.87, "tokens": 28350}

The VM writes to this file automatically when the verify callback returns Done. You do not need to manage this file manually. However, you can read it in your verify function or in post-run analysis to understand how many iterations each task required and how much each task cost.

Learnings File Format (JSONL) #

The learnings file records insights that the agent discovers during execution. These are extracted automatically by the VM when the agent's output contains structured learning markers, or you can write to it from your verify function:

jsonl
{"learning": "The project uses ESM modules -- use import/export, not require()", "iteration": 1, "task": "Implement the User model"}
{"learning": "Tests expect snake_case method names, not camelCase", "iteration": 3, "task": "Write the UserRepository"}
{"learning": "The database connection is configured in config/db.js", "iteration": 4, "task": "Write the UserRepository"}

On each iteration, the VM loads all learnings and includes them in the context. This gives the fresh-context agent access to hard-won knowledge from previous iterations without carrying the full conversation history.

File Relationships Diagram #

Task 1
Task 2
Task 3 ←(current)
Task 4
Task 5
You are building a Node.js
REST API. The project uses
Express, Jest for testing...
{task1: done}
{task2: done}
{learning: "use ESM..."}
{learning: "snake_case..."}
📝 Note

If you omit plan_file, the forge agent runs as a single-task loop. It will iterate up to max_iterations times, calling verify after each iteration, until verify returns Done or the loop exits. This is useful for tasks where the goal is a single deliverable rather than a list of tasks.


25.7 Checkpoint Strategies #

After each verified task, the forge agent can optionally create a checkpoint -- a snapshot of the workspace state that you can roll back to if something goes wrong later.

Available Strategies #

Strategy Value What It Does
Git "git" Runs git add -A && git commit -m "forge: <task>" in the workspace directory
Snapshot "snapshot" Creates a filesystem copy of the workspace to .neam/snapshots/<iteration>/
None "none" No checkpointing (default)

When to Use Each Strategy #

Use "git" when you are building code in a git repository. This is the most common choice for TDD workflows. Each verified task becomes a separate commit, giving you a clean history of the agent's work. You can git log to see what was done, git diff between commits to see what changed, and git revert to undo a specific task.

Use "snapshot" when you are working outside a git repository or when the workspace contains binary files that git does not handle well. Snapshots are filesystem copies, so they preserve everything regardless of file type. The tradeoff is disk space -- each snapshot is a full copy.

Use "none" during development and debugging, when you want maximum speed and do not need rollback capability. Also use "none" when the workspace is ephemeral (for example, a temporary directory that will be deleted after the run).

Git Checkpoint Workflow #

LLM call
+ tools
Tool
results
Verify:
Done!
LLM call
+ tools
Tool
results
Verify:
Done!
LLM call
+ tools
Tool
results
Verify:
Retry!
LLM call
+ tools
Tool
results
Verify:
Done!

Notice that iteration 3 (where verify returned Retry) does not produce a commit. Only verified tasks are checkpointed. This keeps the git history clean -- every commit represents a verified, working state.

Common Mistake: Forgetting to Initialize Git

Forgetting to Initialize Git

If you set checkpoint: "git" but the workspace is not a git repository, the checkpoint will fail. Always ensure that git init has been run in the workspace directory before starting a forge agent with git checkpointing. The forge agent does not initialize git for you -- this is by design, to prevent accidentally creating a repository in the wrong directory.


25.8 Fresh Context Model #

The fresh context model is the defining characteristic of forge agents. This section explains why it works, how the VM implements it, and how it compares to the accumulated context model used by claw agents.

Why Fresh Context Prevents Drift #

Large language models have a finite context window. When a claw agent accumulates dozens of messages -- system prompt, user messages, assistant responses, tool call results -- the context window fills up. The model must compress or truncate older messages, leading to information loss. Worse, the system prompt (which defines the agent's behavior) gets pushed further from the model's primary attention zone.

Fresh context eliminates this entirely. Each iteration's context window looks like this:

┌─────────────────────────────────────────────────────────────┐
│  [1] System prompt                    (always first)        │
│  [2] Prompt file contents             (project context)     │
│  [3] Current task                     (single task focus)   │
│  [4] Progress summary                 (2-3 sentences)       │
│  [5] Learnings from previous runs     (key insights only)   │
│  [6] Feedback from last verify        (if retry)            │
│                                                             │
│  Total: ~2,000 - 5,000 tokens (predictable, bounded)       │
└─────────────────────────────────────────────────────────────┘

Compare this to a claw agent after 20 iterations:

┌─────────────────────────────────────────────────────────────┐
│  [1]  System prompt                                         │
│  [2]  User message 1                                        │
│  [3]  Assistant response 1 (with tool calls)                │
│  [4]  Tool result 1a                                        │
│  [5]  Tool result 1b                                        │
│  [6]  Assistant continuation 1                              │
│  [7]  User message 2                                        │
│  ...                                                        │
│  [87] Tool result 20c                                       │
│  [88] Assistant response 20                                 │
│                                                             │
│  Total: ~50,000 - 120,000 tokens (growing, unpredictable)  │
└─────────────────────────────────────────────────────────────┘

How the VM Handles Per-Iteration Context #

On each iteration, the VM performs the following steps:

  1. Discard all messages from the previous iteration. Nothing carries over.
  2. Read the system prompt from the agent declaration.
  3. Read the prompt file from disk (it may have changed since the last iteration).
  4. Read the plan file and identify the current task.
  5. Read the progress file and generate a concise summary of completed work.
  6. Read the learnings file and include relevant learnings.
  7. Assemble the message array and send it to the LLM.

The key insight is that steps 3 through 6 read from the filesystem. If the agent wrote files during the previous iteration, those files are now part of the world state. The next iteration can read them using its skills (e.g., workspace_read). The filesystem is the continuity mechanism -- not the context window.

Comparison Table #

Aspect Claw Agent Forge Agent
Context growth Linear with iterations Constant
System prompt attention Degrades over time Always maximum
Memory mechanism Conversation history Filesystem + progress files
Drift risk High after many turns Near zero
Best for Interactive conversations Build/transform tasks
Resumability Requires session persistence Built-in via progress file
Context window usage Grows to fill limit Bounded and predictable
🌎 Real-World Analogy

Imagine two approaches to writing a novel. The claw agent approach is to keep every draft, note, and revision in a single document that grows to hundreds of pages. By the time you reach Chapter 10, you are scrolling through thousands of lines of context to find what you need. The forge agent approach is to keep the manuscript in a separate file, read only the current chapter outline and style guide each morning, write one chapter, have an editor review it, and start fresh the next day. The manuscript (filesystem) grows, but your daily working context stays focused.


25.9 Forbidden Fields #

Forge agents are intentionally restricted from using certain fields that belong to claw agents. These restrictions are enforced at compile time -- if you use a forbidden field, the compiler will reject the program with a clear error message.

Forbidden Field Table #

Forbidden Field Compile Error Message Why It Is Forbidden
session forge agent cannot use 'session': forge agents have no persistent session; use a claw agent for session-based workflows Sessions imply accumulated conversation history, which contradicts the fresh context model
channels forge agent cannot use 'channels': multi-channel I/O is a claw agent feature; forge agents use workspace files for I/O Channels are for real-time interactive I/O, which is incompatible with iterative build loops
lanes forge agent cannot use 'lanes': parallel conversation lanes require session state; forge agents are single-task-per-iteration Lanes require persistent state across turns, which forge agents do not maintain

Example: Compile Error #

neam
// This will NOT compile
forge agent BadAgent {
  provider: "openai"
  model: "gpt-4o"
  verify: my_verify
  session: { history: 50 }    // COMPILE ERROR
}

The compiler outputs:

text
error[E0451]: forge agent cannot use 'session'
  --> bad_agent.neam:6:3
   |
 6 |   session: { history: 50 }
   |   ^^^^^^^ forge agents have no persistent session;
   |           use a claw agent for session-based workflows

Type Safety Rationale #

These compile-time restrictions exist to prevent logical errors. A forge agent with a session field would create a confusing hybrid that neither maintains proper fresh context nor provides proper session persistence. By making the restriction a compile error rather than a runtime warning, Neam ensures that developers choose the right agent type for their use case:

The three agent types are distinct by design. Mixing their features would undermine the guarantees that each type provides.

💡 Tip

If you find yourself wanting a session field on a forge agent, reconsider your architecture. You probably need either a claw agent that runs a long conversation, or a forge agent that stores state in files. The forge agent's filesystem-as-memory pattern can accomplish most things that sessions provide, but with better resumability and no drift risk.


25.10 Real-World Example: TDD Code Builder #

This section presents a complete, working forge agent that implements a test-driven development workflow. The agent reads a plan file, implements each task, runs tests to verify, and commits verified work to git.

Skills #

First, define the skills the agent needs:

neam
skill write_file {
  description: "Write content to a file in the workspace"
  params: { path: string, content: string }
  impl(path, content) {
    return workspace_write(path, content);
  }
}

skill read_file {
  description: "Read the contents of a file in the workspace"
  params: { path: string }
  impl(path) {
    return workspace_read(path);
  }
}

skill run_command {
  description: "Execute a shell command in the workspace directory"
  params: { command: string }
  impl(command) {
    return exec(command);
  }
}

skill list_files {
  description: "List files in a directory within the workspace"
  params: { directory: string }
  impl(directory) {
    return workspace_list(directory);
  }
}

Verify Function #

The verify function runs the test suite and checks for pass/fail:

neam
fun verify_tests(ctx) {
  // Run the test suite
  let result = exec("npm test 2>&1");

  if (result.exit_code == 0) {
    // All tests pass -- record a learning if this was a retry
    if (ctx.feedback != nil) {
      emit f"[verify] Task succeeded after retry on iteration {ctx.iteration}.";
    }
    return VerifyResult.Done(
      f"Task '{ctx.current_task}' verified: all tests pass."
    );
  }

  // Tests failed
  let error_output = result.stdout;

  // Abort if we have spent too much on this task
  if (ctx.total_cost > 8.0) {
    return VerifyResult.Abort(
      f"Cost limit approaching (${ctx.total_cost}). " +
      f"Task '{ctx.current_task}' is too expensive to retry."
    );
  }

  // Provide detailed feedback for retry
  let feedback = f"Verification failed on iteration {ctx.iteration}.\n";
  feedback = feedback + f"Task: {ctx.current_task}\n";
  feedback = feedback + f"Test output:\n{error_output}\n\n";
  feedback = feedback + "Read the error messages carefully. ";
  feedback = feedback + "Check the file you wrote and fix any issues.";

  return VerifyResult.Retry(feedback);
}

Forge Agent Declaration #

neam
forge agent TDDCodeBuilder {
  provider: "anthropic"
  model: "claude-sonnet-4"
  verify: verify_tests
  system: """
    You are a senior software engineer building a Node.js project using
    test-driven development. Your workflow for each task:

    1. Read the current task description carefully.
    2. Use read_file to examine any existing code and test files.
    3. Use list_files to understand the project structure.
    4. Write implementation code using write_file.
    5. Do NOT modify test files -- tests are pre-written.
    6. Do NOT run tests yourself -- the verification system handles that.

    Write clean, well-documented code. Use meaningful variable names.
    Follow the existing code style you observe in the project.
  """
  temperature: 0.2
  skills: [write_file, read_file, run_command, list_files]
  workspace: "./my-project"

  loop {
    max_iterations: 30
    max_cost: 12.0
    max_tokens: 600000
    prompt_file: "prompt.md"
    plan_file: "plan.txt"
    progress_file: "progress.jsonl"
    learnings_file: "learnings.jsonl"
  }

  checkpoint: "git"
}

Plan File #

Create a file at ./my-project/plan.txt:

text
Implement src/models/user.js with User class (name, email, password_hash fields)
Implement src/repositories/user-repository.js with create, findById, findByEmail methods
Implement src/services/auth-service.js with register and login methods
Add input validation to the register method (email format, password length)
Implement src/routes/auth-routes.js with POST /register and POST /login endpoints

Prompt File #

Create a file at ./my-project/prompt.md:

text
## Project Context

You are building a Node.js REST API for user authentication.

### Technology Stack
- Runtime: Node.js with ES modules
- Framework: Express
- Testing: Jest
- Database: In-memory Map (for simplicity)

### Project Structure

my-project/ src/ models/ -- Data models repositories/ -- Data access layer services/ -- Business logic routes/ -- HTTP route handlers tests/ -- Pre-written test files package.json

text

### Conventions
- Use ES module syntax (import/export)
- Use async/await for asynchronous operations
- All public functions should have JSDoc comments
- Method names use camelCase
- File names use kebab-case

Usage #

neam
{
  emit "=== TDD Code Builder ===";
  emit "";

  let outcome = TDDCodeBuilder.run();

  emit "";
  emit "=== Build Complete ===";
  emit f"Iterations used: {outcome.iterations}";
  emit f"Tasks completed: {outcome.tasks_completed}/{outcome.tasks_total}";
  emit f"Total cost: ${outcome.total_cost}";
  emit f"Total tokens: {outcome.total_tokens}";
  emit "";

  match outcome {
    Completed(msg) => {
      emit "SUCCESS: All tasks completed and verified.";
      emit f"Final message: {msg}";
    },
    MaxIterations => {
      emit "WARNING: Reached iteration limit.";
      emit "Some tasks may not be complete. Review progress.jsonl.";
    },
    Aborted(reason) => {
      emit f"ABORTED: {reason}";
      emit "Manual intervention required.";
    },
    BudgetExhausted => {
      emit "BUDGET EXHAUSTED: The run exceeded cost or token limits.";
      emit "Review progress.jsonl and consider increasing limits.";
    }
  }
}

Walkthrough #

Let us trace through what happens when this program runs.

Setup. The VM reads plan.txt and finds five tasks. It checks progress.jsonl -- if the file does not exist or is empty, all five tasks are pending. It reads prompt.md to get the project context. It connects to the Anthropic API.

Iteration 1. The VM builds a fresh context: system prompt + prompt.md contents + "Current task: Implement src/models/user.js..." + "No tasks completed yet." It sends this to the LLM. The agent uses read_file to examine the test file, then uses write_file to create src/models/user.js. The VM calls verify_tests, which runs npm test. If tests pass, the VM writes a line to progress.jsonl, runs git add -A && git commit -m "forge: Implement src/models/user.js...", and advances to task 2.

Iteration 2. The VM builds a new fresh context: system prompt + prompt.md + "Current task: Implement src/repositories/user-repository.js..." + "Completed: User model (iteration 2)." The agent has no memory of iteration 1's tool calls, but it can use read_file to see the code it wrote in the previous iteration (because it is on disk). It implements the repository, and the verify function runs tests again.

Iteration 3 (hypothetical retry). Suppose the tests fail because the agent used require() instead of import. The verify function returns VerifyResult.Retry("Tests failed: SyntaxError: Cannot use require in ES module"). The VM does not checkpoint. It starts iteration 4 with the feedback included in the context. The agent reads the feedback, recognizes the error, rewrites the file with import syntax, and this time the tests pass.

Completion. After all five tasks are verified, the VM returns LoopOutcome.Completed("All 5 tasks verified successfully."). The git log shows five clean commits, one per task.

🎯 Try It Yourself

Create a simple Node.js project with pre-written test files. Write a plan file with three tasks. Create a minimal forge agent that implements those tasks. Start with checkpoint: "none" for faster iteration during development, then switch to checkpoint: "git" when you are satisfied with the workflow.


Summary #

In this chapter, you learned:

Forge agents are ideal for any task that can be decomposed into a plan of verifiable steps. In the next chapter, you will learn about semantic memory and workspace I/O -- the mechanisms that let agents persist and retrieve knowledge across runs.


Exercises #

Exercise 25.1: Minimal Forge Agent Write a forge agent that creates a file called greeting.txt containing "Hello, Neam!" in the workspace directory. The verify function should check that the file exists and contains the expected text. Use checkpoint: "none" and max_iterations: 5. Run it and observe the outcome.

Exercise 25.2: Multi-Task Plan Create a plan file with three tasks: (1) create a README.md with a project title, (2) create a LICENSE file with the MIT license text, (3) create a .gitignore file that excludes node_modules/ and .env. Write a forge agent that processes this plan. The verify function should check that each file exists and is not empty. Use checkpoint: "none".

Exercise 25.3: Verify with Feedback Write a verify function that checks whether a generated Python file passes flake8 linting. If linting fails, return VerifyResult.Retry with the linting errors as feedback. If linting passes, return VerifyResult.Done. Test this with a forge agent that writes a simple Python function.

Exercise 25.4: Abort on Cost Modify the verify function from Exercise 25.3 to return VerifyResult.Abort if ctx.total_cost exceeds 5.0 USD. Write a program that handles the Aborted outcome with a match expression and emits the abort reason.

Exercise 25.5: Git Checkpointing Initialize a git repository in a workspace directory. Write a forge agent with checkpoint: "git" and a plan file containing three tasks. After the forge agent completes, use run_command to execute git log --oneline and emit the result. Verify that you see one commit per completed task.

Exercise 25.6: Progress File Analysis After running a forge agent with a progress_file, write a separate Neam program that reads the progress JSONL file, parses each line, and emits a summary report showing: the total number of tasks, the total number of iterations across all tasks, the average iterations per task, and the total cost.

Exercise 25.7: Learnings File Write a forge agent with a learnings_file. In your verify function, write a learning entry to the learnings file whenever the agent needed a retry (that is, when ctx.feedback != nil and the verify now returns Done). After the run, read the learnings file and emit all accumulated learnings.

Exercise 25.8: Forge vs. Claw Comparison Write two programs that accomplish the same goal -- generating three Markdown files based on a plan. Program A uses a claw agent with accumulated context. Program B uses a forge agent with fresh context. Run both and compare: (a) total tokens consumed, (b) total cost, (c) whether the output quality degrades on the third file. Write a brief analysis in comments explaining what you observed.

Start typing to search...