Chapter 25: Forge Agents -- Iterative Build Agents #
"The smith does not carry yesterday's heat into today's strike. Each blow begins fresh, guided only by the shape of what remains." -- Workshop proverb
In Chapter 24, you learned about claw agents -- persistent session agents that maintain conversation history across turns. Claw agents keep themselves alive, accumulating context as a conversation unfolds. This chapter introduces their counterpart: forge agents. Where a claw agent says "keep the agent alive, let context grow," a forge agent says "keep the world alive, move the agent through it." Each iteration starts with a clean context window. The filesystem is the memory. The plan file is the roadmap. The verify callback is the quality gate. The result is an agent architecture purpose-built for long-horizon build tasks -- TDD workflows, code generation pipelines, document assembly, and any problem where accumulated context would eventually cause drift.
By the end of this chapter, you will be able to:
- Explain the forge agent philosophy and how it differs from claw agents
- Declare a forge agent with
forge agentsyntax - Configure the
.run()pipeline and understand each stage - Write verify callbacks that return
Done,Retry, orAbort - Inspect
LoopOutcomevalues withmatchexpressions - Configure loop parameters: iterations, cost limits, token budgets
- Use plan files, progress files, and learnings files
- Choose a checkpoint strategy (
"git","snapshot","none") - Understand why fresh context prevents drift
- Recognize forbidden fields that produce compile errors
- Build a complete TDD code builder with forge agents
Long-running AI tasks fail in predictable ways. After dozens of tool calls, the agent's context window fills with stale information. It forgets early instructions. It repeats mistakes. It drifts from the plan. Human developers solve this problem naturally: they close the editor, take a break, reopen the project, read the code on disk, and pick up where they left off. The forge agent encodes this pattern into a language construct. Each iteration is a fresh developer sitting down at the workstation, reading the project files, doing one task, verifying the result, and committing. The world (filesystem, git history, plan file) carries all the state. The agent carries none.
25.1 The Forge Agent Philosophy #
A forge agent operates on a fundamentally different principle from the agents you have used so far.
Keep the World Alive, Move the Agent Through It #
In a standard agent or claw agent, the agent is the center of gravity. It accumulates messages, tool results, and conversation history in its context window. The longer it runs, the more context it carries. This works well for short interactions, but for long-horizon tasks -- building a project file by file, refactoring a codebase, running a multi-step test-driven development cycle -- the accumulated context becomes a liability.
A forge agent inverts this relationship. The world (filesystem, git repository, plan files, progress logs) is the persistent state. The agent is ephemeral. Each iteration creates a fresh agent with a clean context window, loads only the information it needs from the world, performs one task, verifies the result, and exits. The next iteration starts clean again.
Fresh Context Model #
Each forge iteration starts with a blank slate. The VM constructs a new message array containing only:
- The system prompt
- The contents of the prompt file (loaded from disk)
- The current task from the plan file
- A summary of completed tasks from the progress file
- Any learnings from previous iterations
This means the context window size is bounded and predictable. Whether the agent is on iteration 1 or iteration 25, it receives approximately the same amount of context. No stale tool results. No forgotten instructions buried under pages of conversation history.
Drift Prevention by Design #
Context drift is the gradual degradation of agent behavior as accumulated messages push the system prompt further from the attention window. In transformer-based models, information at the beginning and end of the context receives the most attention (the "primacy" and "recency" effects). As a claw agent accumulates history, the system prompt -- which defines the agent's personality and constraints -- gets pushed into a low-attention zone.
Forge agents eliminate this problem structurally. The system prompt is always at the top of a fresh, short context. There is no history to push it away. The agent's behavior on iteration 25 is as faithful to the system prompt as on iteration 1.
Comparison with the Ralph Loop Pattern #
Before forge agents existed as a language construct, developers built iterative loops manually using what was known as the "Ralph Loop" pattern -- an external script that:
- Reads a plan file
- Calls an LLM with the current task
- Checks if the task passed verification
- Updates a progress file
- Commits to git
- Repeats
This pattern worked, but it required boilerplate orchestration code outside the Neam
program. The forge agent keyword turns this external pattern into a first-class
language construct with compile-time safety, built-in budget tracking, and standardized
file formats.
Think of a forge agent like a relay race. Each runner (iteration) sprints one leg of the race, then hands the baton (the filesystem state) to the next runner. No single runner needs to remember the entire race -- they only need to know which leg they are running and where the baton is. The track (filesystem) and the scoreboard (progress file) carry the full history.
25.2 Forge Agent Declaration #
A forge agent is declared using the forge agent keyword followed by a name and a
configuration block. The syntax is similar to a standard agent declaration but with
additional fields specific to iterative build workflows.
Syntax Overview #
forge agent <Name> {
// Required fields
provider: <string>
model: <string>
verify: <function_reference>
// Optional fields
system: <string>
temperature: <float>
skills: [<skill_list>]
guards: [<guard_chain_list>]
workspace: <string>
// Loop configuration
loop {
max_iterations: <int>
max_cost: <float>
max_tokens: <int>
prompt_file: <string>
plan_file: <string>
progress_file: <string>
learnings_file: <string>
}
// Checkpoint strategy
checkpoint: <string>
}
Required Fields #
| Field | Type | Description |
|---|---|---|
provider |
string | LLM provider ("ollama", "openai", "anthropic", "gemini") |
model |
string | Model identifier (e.g., "gpt-4o", "claude-sonnet-4") |
verify |
function | Reference to a verify callback function |
The verify field is what makes a forge agent a forge agent. Without external
verification, the agent would have no way to know whether its work succeeded. You will
learn the full verify callback protocol in Section 25.4.
Optional Fields #
| Field | Type | Default | Description |
|---|---|---|---|
system |
string | "" |
System prompt defining agent behavior |
temperature |
float | 0.7 |
Sampling temperature (0.0 to 2.0) |
skills |
list | [] |
Tools available to the agent |
guards |
list | [] |
Guard chains for safety |
workspace |
string | "." |
Working directory for file operations |
loop |
block | defaults | Loop configuration (see Section 25.6) |
checkpoint |
string | "none" |
Checkpoint strategy: "git", "snapshot", "none" |
Minimal Forge Agent #
Here is the simplest possible forge agent. It uses all defaults for the loop configuration and has no checkpoint strategy:
fun check_output(ctx) {
let file = file_read("output.txt");
if (file != nil) {
return VerifyResult.Done("Output file created successfully.");
}
return VerifyResult.Retry("output.txt does not exist yet. Please create it.");
}
forge agent Builder {
provider: "openai"
model: "gpt-4o"
verify: check_output
system: "You are a code builder. Complete the current task."
skills: [write_file, read_file]
}
{
let outcome = Builder.run();
emit f"Result: {outcome.message}";
}
This is 15 lines of agent declaration (including the verify function). The forge agent
will iterate up to 25 times (the default), calling the LLM with the system prompt, then
checking check_output after each iteration.
Full-Featured Forge Agent #
Here is a forge agent that uses every available configuration option:
fun verify_tests(ctx) {
let result = exec("npm test 2>&1");
if (result.exit_code == 0) {
return VerifyResult.Done(f"Task {ctx.current_task} passed all tests.");
}
if (ctx.iteration > 3) {
return VerifyResult.Abort("Failed after 3 attempts. Manual review needed.");
}
return VerifyResult.Retry(f"Tests failed:\n{result.stdout}\nPlease fix the errors.");
}
forge agent TDDBuilder {
provider: "anthropic"
model: "claude-sonnet-4"
verify: verify_tests
system: """
You are a senior software engineer practicing test-driven development.
Read the current task from the plan. Write code that makes the tests pass.
Do not modify test files. Write clean, documented code.
"""
temperature: 0.3
skills: [workspace_write, workspace_read, exec_command]
guards: [SecurityChain]
workspace: "./project"
loop {
max_iterations: 30
max_cost: 15.0
max_tokens: 750000
prompt_file: "prompt.md"
plan_file: "plan.txt"
progress_file: "progress.jsonl"
learnings_file: "learnings.jsonl"
}
checkpoint: "git"
}
This declaration is approximately 40 lines and covers:
- A verify function that runs
npm testand checks the exit code - Anthropic as the provider with a specific model
- A detailed system prompt using multi-line string syntax
- Low temperature for deterministic code generation
- Three skills for file I/O and command execution
- A security guard chain
- A workspace directory
- Full loop configuration with all file paths
- Git checkpointing after each verified task
Start with a minimal forge agent and add configuration as you need it. The defaults
are sensible for most use cases. You can always add a loop block or change the
checkpoint strategy later.
25.3 The .run() Pipeline #
When you call .run() on a forge agent, the VM executes a multi-stage pipeline. This
section walks through every step in detail.
Pipeline Overview #
Stage 1: Setup #
Before the main loop begins, the VM performs initialization:
-
Security checks. If the forge agent has
guardsor apolicy, the VM validates them at startup. Any compile-time policy violation halts execution immediately. -
Provider creation. The VM establishes a connection to the LLM provider, resolving the API key from the environment and validating the model identifier.
-
Plan file loading. If
plan_fileis configured, the VM reads it and parses each line as a separate task. Empty lines and lines starting with#are ignored. -
Progress file loading. If
progress_fileis configured, the VM reads the JSONL file and identifies which tasks have already been completed. This enables resumability -- if the program crashes and restarts, it picks up from the last completed task. -
Prompt file loading. If
prompt_fileis configured, the VM reads it as a Markdown or plain text file. Its contents become part of every iteration's context. -
Learnings file loading. If
learnings_fileis configured, the VM reads accumulated learnings from previous runs and includes them in the context.
Stage 2: Main Loop #
The main loop runs from iteration = 1 to max_iterations. Each iteration follows
four sub-stages:
a. Budget check. The VM compares total_cost against max_cost and total_tokens
against max_tokens. If either limit is exceeded, the loop exits immediately with a
BudgetExhausted outcome.
b. Build messages. This is where the fresh context model is implemented. The VM constructs a new message array from scratch:
messages = [
{ role: "system", content: <system_prompt> },
{ role: "user", content: <prompt_file_contents> },
{ role: "user", content: "Current task: <task_description>" },
{ role: "user", content: "Completed tasks: <progress_summary>" },
{ role: "user", content: "Learnings: <learnings_summary>" },
{ role: "assistant", content: <feedback_from_previous_verify> } // if retry
]
Notice that this array is built fresh every iteration. No conversation history from previous iterations leaks in.
c. LLM + tool loop. The VM sends the messages to the LLM. If the LLM responds with a tool call, the VM executes the tool, appends the result to the messages, and sends the updated messages back to the LLM. This inner loop can repeat up to 25 times per iteration, allowing the agent to chain multiple tool calls (read a file, write a file, run a command) within a single iteration.
d. Verify. After the LLM produces a final text response (no more tool calls), the
VM calls the verify callback function. The callback receives a context object and
returns a VerifyResult value that determines what happens next.
Stage 3: Return #
When the loop exits -- whether by completing all tasks, hitting the iteration limit,
being aborted, or exhausting the budget -- the VM returns a LoopOutcome value to
the calling code.
Code Example #
fun verify_build(ctx) {
let result = exec("cargo build 2>&1");
if (result.exit_code == 0) {
return VerifyResult.Done("Build succeeded.");
}
return VerifyResult.Retry(f"Build failed:\n{result.stdout}");
}
forge agent RustBuilder {
provider: "openai"
model: "gpt-4o"
verify: verify_build
system: "You are a Rust developer. Implement the current task."
skills: [workspace_write, workspace_read, exec_command]
loop {
max_iterations: 20
plan_file: "plan.txt"
progress_file: "progress.jsonl"
}
checkpoint: "git"
}
{
let outcome = RustBuilder.run();
match outcome {
Completed(msg) => emit f"All tasks completed: {msg}",
MaxIterations => emit "Reached iteration limit. Some tasks remain.",
Aborted(reason) => emit f"Aborted: {reason}",
BudgetExhausted => emit "Budget exhausted before completion."
}
}
The .run() method takes no arguments. All configuration comes from the forge agent
declaration. The prompt, plan, and context are loaded from files specified in the
loop block. This design ensures that the same forge agent can be run repeatedly
with different plan files by updating the files on disk before calling .run().
25.4 The Verify Callback #
The verify callback is the quality gate that separates forge agents from simple loops. After each iteration, the VM calls the verify function to determine whether the agent's work meets the acceptance criteria.
Function Signature #
fun verify_name(ctx) {
// ctx.iteration → int: current iteration number (1-based)
// ctx.current_task → string: the current task description
// ctx.feedback → string | nil: feedback from previous verify (if retry)
// ctx.total_cost → float: cumulative cost in USD so far
// ctx.total_tokens → int: cumulative token usage so far
// ctx.workspace → string: path to the workspace directory
//
// Must return a VerifyResult value
}
The context object (ctx) provides five fields that give the verify function full
visibility into the loop's state. This allows sophisticated verification logic that
considers cost, iteration count, and previous feedback.
VerifyResult Sealed Type #
The verify function must return a value of the VerifyResult sealed type:
sealed VerifyResult {
Done(message: string),
Retry(feedback: string),
Abort(reason: string)
}
Each variant triggers a different behavior in the main loop:
| Return Value | Effect |
|---|---|
VerifyResult.Done(message) |
Checkpoint the current state. Mark the current task as completed in the progress file. Advance to the next task. If no tasks remain, exit the loop with Completed. |
VerifyResult.Retry(feedback) |
Feed the feedback string back to the agent in the next iteration as additional context. The agent will see what went wrong and can attempt to fix it. |
VerifyResult.Abort(reason) |
Stop the loop immediately. Return an Aborted outcome with the given reason. Use this for unrecoverable failures. |
Simple Verify Example: File Existence Check #
The simplest verify function checks whether a file exists:
fun check_file_exists(ctx) {
let content = file_read("output.txt");
if (content != nil and len(content) > 0) {
return VerifyResult.Done("File output.txt created with content.");
}
return VerifyResult.Retry("File output.txt is missing or empty. Please create it.");
}
This is useful for tasks where the acceptance criterion is simply "produce a file."
Complex Verify Example: Run Tests and Check Coverage #
A production verify function often runs external commands and parses their output:
fun verify_tdd(ctx) {
// Run the test suite
let test_result = exec("npm test 2>&1");
if (test_result.exit_code != 0) {
// Tests failed -- provide the error output as feedback
let feedback = f"Tests failed on iteration {ctx.iteration}.\n";
feedback = feedback + f"Error output:\n{test_result.stdout}\n";
feedback = feedback + "Please read the error messages and fix the code.";
// After 5 failed attempts on the same task, abort
if (ctx.iteration > 5) {
return VerifyResult.Abort(
f"Task '{ctx.current_task}' failed after 5 attempts. " +
"Manual intervention required."
);
}
return VerifyResult.Retry(feedback);
}
// Tests passed -- check coverage
let cov_result = exec("npm run coverage -- --json 2>&1");
if (cov_result.exit_code == 0) {
let cov_data = json_parse(cov_result.stdout);
let coverage_pct = cov_data["total"]["lines"]["pct"];
if (coverage_pct < 80.0) {
return VerifyResult.Retry(
f"Tests pass but coverage is {coverage_pct}%. Need at least 80%."
);
}
}
return VerifyResult.Done(
f"Task '{ctx.current_task}' verified: tests pass, coverage adequate."
);
}
This verify function:
- Runs the test suite and checks for failures
- If tests fail, provides detailed error output as feedback
- If too many failures accumulate, aborts the entire loop
- If tests pass, checks code coverage
- If coverage is below threshold, retries with a coverage-specific message
- Only returns
Donewhen both tests and coverage pass
Verifying Inside the Agent Prompt
A frequent error is to include verification instructions in the system prompt:
"After writing the code, run the tests yourself and make sure they pass." This is
unreliable for two reasons. First, the LLM may hallucinate test results instead of
actually running tests. Second, the agent's assessment of its own work is biased --
it generated the code, so it is inclined to believe it works. External verification
through the verify callback is objective and deterministic. The verify function
runs real commands, checks real files, and returns real results. Never delegate
verification to the agent itself.
25.5 LoopOutcome #
When the .run() pipeline exits, it returns a LoopOutcome value. This is a sealed
type that encodes exactly how and why the loop terminated.
Sealed Type Definition #
sealed LoopOutcome {
Completed(message: string),
MaxIterations,
Aborted(reason: string),
BudgetExhausted
}
Outcome Details #
| Variant | When It Occurs | Fields |
|---|---|---|
Completed(message) |
All tasks in the plan file were verified successfully | message: final status message |
MaxIterations |
The loop reached max_iterations without completing all tasks |
(none) |
Aborted(reason) |
The verify callback returned VerifyResult.Abort(reason) |
reason: why the loop was aborted |
BudgetExhausted |
total_cost >= max_cost or total_tokens >= max_tokens |
(none) |
Return Value Properties #
In addition to the variant, the LoopOutcome value carries metadata accessible through
properties:
| Property | Type | Description |
|---|---|---|
outcome.iterations |
int | Total number of iterations executed |
outcome.total_cost |
float | Total cost in USD across all iterations |
outcome.total_tokens |
int | Total tokens consumed across all iterations |
outcome.message |
string | Human-readable summary of the outcome |
outcome.tasks_completed |
int | Number of tasks that were verified |
outcome.tasks_total |
int | Total number of tasks in the plan |
Code Example: Checking the Outcome #
{
let outcome = MyForgeAgent.run();
match outcome {
Completed(msg) => {
emit "All tasks completed successfully.";
emit f"Message: {msg}";
emit f"Iterations used: {outcome.iterations}";
emit f"Total cost: ${outcome.total_cost}";
},
MaxIterations => {
emit f"Reached {outcome.iterations} iterations.";
emit f"Tasks completed: {outcome.tasks_completed}/{outcome.tasks_total}";
emit "Consider increasing max_iterations or simplifying tasks.";
},
Aborted(reason) => {
emit f"Loop aborted: {reason}";
emit f"Completed {outcome.tasks_completed} of {outcome.tasks_total} tasks.";
emit "Review the progress file for details.";
},
BudgetExhausted => {
emit "Budget exhausted.";
emit f"Spent: ${outcome.total_cost} / tokens: {outcome.total_tokens}";
emit f"Tasks completed: {outcome.tasks_completed}/{outcome.tasks_total}";
}
}
}
The match expression ensures you handle every possible outcome. If you add logging,
alerting, or retry logic, you can do so per-variant. For instance, you might
automatically retry with a higher budget when BudgetExhausted occurs, or send a
notification to a Slack channel when Aborted fires.
Always log outcome.total_cost in production. Forge agents that run many iterations
against powerful models can accumulate significant costs. Tracking cost per run helps
you identify inefficient verify functions or overly broad tasks that cause excessive
retries.
25.6 Loop Configuration #
The loop block inside a forge agent declaration controls how the main loop behaves.
Every field has a sensible default, so you only need to specify the ones you want to
override.
Numeric Limits #
| Field | Type | Default | Description |
|---|---|---|---|
max_iterations |
int | 25 |
Maximum number of iterations before the loop exits with MaxIterations |
max_cost |
float | 10.0 |
Maximum cumulative cost in USD before BudgetExhausted |
max_tokens |
int | 500000 |
Maximum cumulative tokens before BudgetExhausted |
File Configuration #
| Field | Type | Default | Description |
|---|---|---|---|
prompt_file |
string | nil |
Path to a Markdown or text file with the base prompt |
plan_file |
string | nil |
Path to a text file with one task per line |
progress_file |
string | nil |
Path to a JSONL file tracking completed tasks |
learnings_file |
string | nil |
Path to a JSONL file storing cross-iteration learnings |
All file paths are relative to the workspace directory.
Plan File Format #
The plan file is a plain text file with one task per line. Empty lines and lines
starting with # are ignored:
# plan.txt -- TDD Code Builder Plan
Implement the User model with name, email, and password_hash fields
Write the UserRepository with create, find_by_id, and find_by_email methods
Implement the AuthService with register and login methods
Add input validation to the registration endpoint
Write integration tests for the full auth flow
Each line becomes a separate task. The forge agent processes tasks in order, advancing
to the next task only when the verify callback returns Done for the current one.
Keep plan tasks small and focused. A task like "Build the entire authentication system" is too broad -- the agent may struggle, and verification becomes ambiguous. Break it into discrete, verifiable steps. Each task should have a clear pass/fail criterion that your verify function can check.
Progress File Format (JSONL) #
The progress file uses JSON Lines format. Each line is a JSON object recording the completion of one task:
{"task": "Implement the User model with name, email, and password_hash fields", "status": "done", "iteration": 2, "timestamp": "2026-02-18T10:23:45Z", "cost": 0.42, "tokens": 12840}
{"task": "Write the UserRepository with create, find_by_id, and find_by_email methods", "status": "done", "iteration": 5, "timestamp": "2026-02-18T10:31:12Z", "cost": 0.87, "tokens": 28350}
The VM writes to this file automatically when the verify callback returns Done. You
do not need to manage this file manually. However, you can read it in your verify
function or in post-run analysis to understand how many iterations each task required
and how much each task cost.
Learnings File Format (JSONL) #
The learnings file records insights that the agent discovers during execution. These are extracted automatically by the VM when the agent's output contains structured learning markers, or you can write to it from your verify function:
{"learning": "The project uses ESM modules -- use import/export, not require()", "iteration": 1, "task": "Implement the User model"}
{"learning": "Tests expect snake_case method names, not camelCase", "iteration": 3, "task": "Write the UserRepository"}
{"learning": "The database connection is configured in config/db.js", "iteration": 4, "task": "Write the UserRepository"}
On each iteration, the VM loads all learnings and includes them in the context. This gives the fresh-context agent access to hard-won knowledge from previous iterations without carrying the full conversation history.
File Relationships Diagram #
If you omit plan_file, the forge agent runs as a single-task loop. It will iterate
up to max_iterations times, calling verify after each iteration, until verify
returns Done or the loop exits. This is useful for tasks where the goal is a single
deliverable rather than a list of tasks.
25.7 Checkpoint Strategies #
After each verified task, the forge agent can optionally create a checkpoint -- a snapshot of the workspace state that you can roll back to if something goes wrong later.
Available Strategies #
| Strategy | Value | What It Does |
|---|---|---|
| Git | "git" |
Runs git add -A && git commit -m "forge: <task>" in the workspace directory |
| Snapshot | "snapshot" |
Creates a filesystem copy of the workspace to .neam/snapshots/<iteration>/ |
| None | "none" |
No checkpointing (default) |
When to Use Each Strategy #
Use "git" when you are building code in a git repository. This is the most common
choice for TDD workflows. Each verified task becomes a separate commit, giving you a
clean history of the agent's work. You can git log to see what was done, git diff
between commits to see what changed, and git revert to undo a specific task.
Use "snapshot" when you are working outside a git repository or when the workspace
contains binary files that git does not handle well. Snapshots are filesystem copies,
so they preserve everything regardless of file type. The tradeoff is disk space --
each snapshot is a full copy.
Use "none" during development and debugging, when you want maximum speed and do
not need rollback capability. Also use "none" when the workspace is ephemeral (for
example, a temporary directory that will be deleted after the run).
Git Checkpoint Workflow #
Notice that iteration 3 (where verify returned Retry) does not produce a commit.
Only verified tasks are checkpointed. This keeps the git history clean -- every commit
represents a verified, working state.
Forgetting to Initialize Git
If you set checkpoint: "git" but the workspace is not a git repository, the
checkpoint will fail. Always ensure that git init has been run in the workspace
directory before starting a forge agent with git checkpointing. The forge agent
does not initialize git for you -- this is by design, to prevent accidentally
creating a repository in the wrong directory.
25.8 Fresh Context Model #
The fresh context model is the defining characteristic of forge agents. This section explains why it works, how the VM implements it, and how it compares to the accumulated context model used by claw agents.
Why Fresh Context Prevents Drift #
Large language models have a finite context window. When a claw agent accumulates dozens of messages -- system prompt, user messages, assistant responses, tool call results -- the context window fills up. The model must compress or truncate older messages, leading to information loss. Worse, the system prompt (which defines the agent's behavior) gets pushed further from the model's primary attention zone.
Fresh context eliminates this entirely. Each iteration's context window looks like this:
┌─────────────────────────────────────────────────────────────┐
│ [1] System prompt (always first) │
│ [2] Prompt file contents (project context) │
│ [3] Current task (single task focus) │
│ [4] Progress summary (2-3 sentences) │
│ [5] Learnings from previous runs (key insights only) │
│ [6] Feedback from last verify (if retry) │
│ │
│ Total: ~2,000 - 5,000 tokens (predictable, bounded) │
└─────────────────────────────────────────────────────────────┘
Compare this to a claw agent after 20 iterations:
┌─────────────────────────────────────────────────────────────┐
│ [1] System prompt │
│ [2] User message 1 │
│ [3] Assistant response 1 (with tool calls) │
│ [4] Tool result 1a │
│ [5] Tool result 1b │
│ [6] Assistant continuation 1 │
│ [7] User message 2 │
│ ... │
│ [87] Tool result 20c │
│ [88] Assistant response 20 │
│ │
│ Total: ~50,000 - 120,000 tokens (growing, unpredictable) │
└─────────────────────────────────────────────────────────────┘
How the VM Handles Per-Iteration Context #
On each iteration, the VM performs the following steps:
- Discard all messages from the previous iteration. Nothing carries over.
- Read the system prompt from the agent declaration.
- Read the prompt file from disk (it may have changed since the last iteration).
- Read the plan file and identify the current task.
- Read the progress file and generate a concise summary of completed work.
- Read the learnings file and include relevant learnings.
- Assemble the message array and send it to the LLM.
The key insight is that steps 3 through 6 read from the filesystem. If the agent wrote
files during the previous iteration, those files are now part of the world state. The
next iteration can read them using its skills (e.g., workspace_read). The filesystem
is the continuity mechanism -- not the context window.
Comparison Table #
| Aspect | Claw Agent | Forge Agent |
|---|---|---|
| Context growth | Linear with iterations | Constant |
| System prompt attention | Degrades over time | Always maximum |
| Memory mechanism | Conversation history | Filesystem + progress files |
| Drift risk | High after many turns | Near zero |
| Best for | Interactive conversations | Build/transform tasks |
| Resumability | Requires session persistence | Built-in via progress file |
| Context window usage | Grows to fill limit | Bounded and predictable |
Imagine two approaches to writing a novel. The claw agent approach is to keep every draft, note, and revision in a single document that grows to hundreds of pages. By the time you reach Chapter 10, you are scrolling through thousands of lines of context to find what you need. The forge agent approach is to keep the manuscript in a separate file, read only the current chapter outline and style guide each morning, write one chapter, have an editor review it, and start fresh the next day. The manuscript (filesystem) grows, but your daily working context stays focused.
25.9 Forbidden Fields #
Forge agents are intentionally restricted from using certain fields that belong to claw agents. These restrictions are enforced at compile time -- if you use a forbidden field, the compiler will reject the program with a clear error message.
Forbidden Field Table #
| Forbidden Field | Compile Error Message | Why It Is Forbidden |
|---|---|---|
session |
forge agent cannot use 'session': forge agents have no persistent session; use a claw agent for session-based workflows |
Sessions imply accumulated conversation history, which contradicts the fresh context model |
channels |
forge agent cannot use 'channels': multi-channel I/O is a claw agent feature; forge agents use workspace files for I/O |
Channels are for real-time interactive I/O, which is incompatible with iterative build loops |
lanes |
forge agent cannot use 'lanes': parallel conversation lanes require session state; forge agents are single-task-per-iteration |
Lanes require persistent state across turns, which forge agents do not maintain |
Example: Compile Error #
// This will NOT compile
forge agent BadAgent {
provider: "openai"
model: "gpt-4o"
verify: my_verify
session: { history: 50 } // COMPILE ERROR
}
The compiler outputs:
error[E0451]: forge agent cannot use 'session'
--> bad_agent.neam:6:3
|
6 | session: { history: 50 }
| ^^^^^^^ forge agents have no persistent session;
| use a claw agent for session-based workflows
Type Safety Rationale #
These compile-time restrictions exist to prevent logical errors. A forge agent with a
session field would create a confusing hybrid that neither maintains proper fresh
context nor provides proper session persistence. By making the restriction a compile
error rather than a runtime warning, Neam ensures that developers choose the right
agent type for their use case:
- Need persistent conversation history? Use a claw agent.
- Need iterative build loops with fresh context? Use a forge agent.
- Need single-turn question/answer? Use a standard agent.
The three agent types are distinct by design. Mixing their features would undermine the guarantees that each type provides.
If you find yourself wanting a session field on a forge agent, reconsider your
architecture. You probably need either a claw agent that runs a long conversation, or
a forge agent that stores state in files. The forge agent's filesystem-as-memory
pattern can accomplish most things that sessions provide, but with better resumability
and no drift risk.
25.10 Real-World Example: TDD Code Builder #
This section presents a complete, working forge agent that implements a test-driven development workflow. The agent reads a plan file, implements each task, runs tests to verify, and commits verified work to git.
Skills #
First, define the skills the agent needs:
skill write_file {
description: "Write content to a file in the workspace"
params: { path: string, content: string }
impl(path, content) {
return workspace_write(path, content);
}
}
skill read_file {
description: "Read the contents of a file in the workspace"
params: { path: string }
impl(path) {
return workspace_read(path);
}
}
skill run_command {
description: "Execute a shell command in the workspace directory"
params: { command: string }
impl(command) {
return exec(command);
}
}
skill list_files {
description: "List files in a directory within the workspace"
params: { directory: string }
impl(directory) {
return workspace_list(directory);
}
}
Verify Function #
The verify function runs the test suite and checks for pass/fail:
fun verify_tests(ctx) {
// Run the test suite
let result = exec("npm test 2>&1");
if (result.exit_code == 0) {
// All tests pass -- record a learning if this was a retry
if (ctx.feedback != nil) {
emit f"[verify] Task succeeded after retry on iteration {ctx.iteration}.";
}
return VerifyResult.Done(
f"Task '{ctx.current_task}' verified: all tests pass."
);
}
// Tests failed
let error_output = result.stdout;
// Abort if we have spent too much on this task
if (ctx.total_cost > 8.0) {
return VerifyResult.Abort(
f"Cost limit approaching (${ctx.total_cost}). " +
f"Task '{ctx.current_task}' is too expensive to retry."
);
}
// Provide detailed feedback for retry
let feedback = f"Verification failed on iteration {ctx.iteration}.\n";
feedback = feedback + f"Task: {ctx.current_task}\n";
feedback = feedback + f"Test output:\n{error_output}\n\n";
feedback = feedback + "Read the error messages carefully. ";
feedback = feedback + "Check the file you wrote and fix any issues.";
return VerifyResult.Retry(feedback);
}
Forge Agent Declaration #
forge agent TDDCodeBuilder {
provider: "anthropic"
model: "claude-sonnet-4"
verify: verify_tests
system: """
You are a senior software engineer building a Node.js project using
test-driven development. Your workflow for each task:
1. Read the current task description carefully.
2. Use read_file to examine any existing code and test files.
3. Use list_files to understand the project structure.
4. Write implementation code using write_file.
5. Do NOT modify test files -- tests are pre-written.
6. Do NOT run tests yourself -- the verification system handles that.
Write clean, well-documented code. Use meaningful variable names.
Follow the existing code style you observe in the project.
"""
temperature: 0.2
skills: [write_file, read_file, run_command, list_files]
workspace: "./my-project"
loop {
max_iterations: 30
max_cost: 12.0
max_tokens: 600000
prompt_file: "prompt.md"
plan_file: "plan.txt"
progress_file: "progress.jsonl"
learnings_file: "learnings.jsonl"
}
checkpoint: "git"
}
Plan File #
Create a file at ./my-project/plan.txt:
Implement src/models/user.js with User class (name, email, password_hash fields)
Implement src/repositories/user-repository.js with create, findById, findByEmail methods
Implement src/services/auth-service.js with register and login methods
Add input validation to the register method (email format, password length)
Implement src/routes/auth-routes.js with POST /register and POST /login endpoints
Prompt File #
Create a file at ./my-project/prompt.md:
## Project Context
You are building a Node.js REST API for user authentication.
### Technology Stack
- Runtime: Node.js with ES modules
- Framework: Express
- Testing: Jest
- Database: In-memory Map (for simplicity)
### Project Structure
my-project/ src/ models/ -- Data models repositories/ -- Data access layer services/ -- Business logic routes/ -- HTTP route handlers tests/ -- Pre-written test files package.json
### Conventions
- Use ES module syntax (import/export)
- Use async/await for asynchronous operations
- All public functions should have JSDoc comments
- Method names use camelCase
- File names use kebab-case
Usage #
{
emit "=== TDD Code Builder ===";
emit "";
let outcome = TDDCodeBuilder.run();
emit "";
emit "=== Build Complete ===";
emit f"Iterations used: {outcome.iterations}";
emit f"Tasks completed: {outcome.tasks_completed}/{outcome.tasks_total}";
emit f"Total cost: ${outcome.total_cost}";
emit f"Total tokens: {outcome.total_tokens}";
emit "";
match outcome {
Completed(msg) => {
emit "SUCCESS: All tasks completed and verified.";
emit f"Final message: {msg}";
},
MaxIterations => {
emit "WARNING: Reached iteration limit.";
emit "Some tasks may not be complete. Review progress.jsonl.";
},
Aborted(reason) => {
emit f"ABORTED: {reason}";
emit "Manual intervention required.";
},
BudgetExhausted => {
emit "BUDGET EXHAUSTED: The run exceeded cost or token limits.";
emit "Review progress.jsonl and consider increasing limits.";
}
}
}
Walkthrough #
Let us trace through what happens when this program runs.
Setup. The VM reads plan.txt and finds five tasks. It checks progress.jsonl --
if the file does not exist or is empty, all five tasks are pending. It reads prompt.md
to get the project context. It connects to the Anthropic API.
Iteration 1. The VM builds a fresh context: system prompt + prompt.md contents +
"Current task: Implement src/models/user.js..." + "No tasks completed yet." It sends
this to the LLM. The agent uses read_file to examine the test file, then uses
write_file to create src/models/user.js. The VM calls verify_tests, which runs
npm test. If tests pass, the VM writes a line to progress.jsonl, runs
git add -A && git commit -m "forge: Implement src/models/user.js...", and advances to
task 2.
Iteration 2. The VM builds a new fresh context: system prompt + prompt.md +
"Current task: Implement src/repositories/user-repository.js..." + "Completed: User
model (iteration 2)." The agent has no memory of iteration 1's tool calls, but it can
use read_file to see the code it wrote in the previous iteration (because it is on
disk). It implements the repository, and the verify function runs tests again.
Iteration 3 (hypothetical retry). Suppose the tests fail because the agent used
require() instead of import. The verify function returns
VerifyResult.Retry("Tests failed: SyntaxError: Cannot use require in ES module").
The VM does not checkpoint. It starts iteration 4 with the feedback included in the
context. The agent reads the feedback, recognizes the error, rewrites the file with
import syntax, and this time the tests pass.
Completion. After all five tasks are verified, the VM returns
LoopOutcome.Completed("All 5 tasks verified successfully."). The git log shows five
clean commits, one per task.
Create a simple Node.js project with pre-written test files. Write a plan file with
three tasks. Create a minimal forge agent that implements those tasks. Start with
checkpoint: "none" for faster iteration during development, then switch to
checkpoint: "git" when you are satisfied with the workflow.
Summary #
In this chapter, you learned:
- Forge agents invert the agent-world relationship: the world (filesystem, git, plan files) is persistent, and the agent starts fresh each iteration.
- The
forge agentkeyword declares an iterative build agent with a requiredverifycallback that gates task completion. - The fresh context model prevents drift by constructing a new, bounded message array on each iteration. The system prompt always receives maximum attention.
- The
.run()pipeline has three stages: setup (load files, connect provider), main loop (budget check, build messages, LLM + tool loop, verify), and return (produce aLoopOutcome). - The verify callback receives a context object with
iteration,current_task,feedback,total_cost, andtotal_tokens. It returns aVerifyResultsealed type:Done(message),Retry(feedback), orAbort(reason). LoopOutcomeis a sealed type with four variants:Completed,MaxIterations,Aborted, andBudgetExhausted. Usematchto handle each variant.- Loop configuration controls
max_iterations(default 25),max_cost(default 10.0),max_tokens(default 500000), and file paths for the prompt, plan, progress, and learnings files. - Plan files contain one task per line. Progress files use JSONL to track completed tasks. Learnings files use JSONL to carry insights across iterations.
- Checkpoint strategies (
"git","snapshot","none") determine how verified work is preserved. Git checkpointing creates one commit per verified task. - Forbidden fields (
session,channels,lanes) produce compile errors on forge agents, enforcing a clean separation between forge and claw agent architectures. - A real-world TDD code builder combines skills (file I/O, command execution), a verify function (run tests, check results), a plan file (task list), and git checkpointing into a complete iterative build pipeline.
Forge agents are ideal for any task that can be decomposed into a plan of verifiable steps. In the next chapter, you will learn about semantic memory and workspace I/O -- the mechanisms that let agents persist and retrieve knowledge across runs.
Exercises #
Exercise 25.1: Minimal Forge Agent
Write a forge agent that creates a file called greeting.txt containing "Hello, Neam!"
in the workspace directory. The verify function should check that the file exists and
contains the expected text. Use checkpoint: "none" and max_iterations: 5. Run it
and observe the outcome.
Exercise 25.2: Multi-Task Plan
Create a plan file with three tasks: (1) create a README.md with a project title,
(2) create a LICENSE file with the MIT license text, (3) create a .gitignore file
that excludes node_modules/ and .env. Write a forge agent that processes this plan.
The verify function should check that each file exists and is not empty. Use
checkpoint: "none".
Exercise 25.3: Verify with Feedback
Write a verify function that checks whether a generated Python file passes flake8
linting. If linting fails, return VerifyResult.Retry with the linting errors as
feedback. If linting passes, return VerifyResult.Done. Test this with a forge agent
that writes a simple Python function.
Exercise 25.4: Abort on Cost
Modify the verify function from Exercise 25.3 to return VerifyResult.Abort if
ctx.total_cost exceeds 5.0 USD. Write a program that handles the Aborted outcome
with a match expression and emits the abort reason.
Exercise 25.5: Git Checkpointing
Initialize a git repository in a workspace directory. Write a forge agent with
checkpoint: "git" and a plan file containing three tasks. After the forge agent
completes, use run_command to execute git log --oneline and emit the result.
Verify that you see one commit per completed task.
Exercise 25.6: Progress File Analysis
After running a forge agent with a progress_file, write a separate Neam program that
reads the progress JSONL file, parses each line, and emits a summary report showing:
the total number of tasks, the total number of iterations across all tasks, the average
iterations per task, and the total cost.
Exercise 25.7: Learnings File
Write a forge agent with a learnings_file. In your verify function, write a learning
entry to the learnings file whenever the agent needed a retry (that is, when
ctx.feedback != nil and the verify now returns Done). After the run, read the
learnings file and emit all accumulated learnings.
Exercise 25.8: Forge vs. Claw Comparison Write two programs that accomplish the same goal -- generating three Markdown files based on a plan. Program A uses a claw agent with accumulated context. Program B uses a forge agent with fresh context. Run both and compare: (a) total tokens consumed, (b) total cost, (c) whether the output quality degrades on the third file. Write a brief analysis in comments explaining what you observed.