📖 13 min read

Chapter 17: Cognitive Features #

"An agent that cannot reflect on its own output is merely an API wrapper."

Up to this point in the book, every agent we have built has been stateless and reactive: it receives a prompt, calls an LLM, and returns the response. It does not evaluate the quality of its own output, learn from past interactions, improve its prompts over time, or take action on its own initiative. In production, this is a serious limitation.

Neam v0.5.0 introduces a cognitive architecture -- a set of six opt-in capabilities that transform reactive agents into reflective, learning, evolving, autonomous systems. These capabilities build on each other in a natural progression:

Reasoning -- structured thinking strategies before answering.
Reflection -- self-evaluation of output quality.
Learning -- recording and reviewing interaction history.
Prompt Evolution -- automatic refinement of system prompts.
Autonomy -- goal-driven behavior on schedules with budgets.
Embedded Inference -- local model execution (ONNX).

Each feature is fully opt-in. An agent without cognitive properties behaves identically to v0.4.x. You can adopt these capabilities incrementally, starting with reasoning and adding more as your requirements grow.

Reasoning

Strategy

▶

Reflection

Engine

▶

Learning

Loop

▶

Evolve

Engine

▼

LLM Provider (OpenAI / Ollama / ...)

▼

SQLite Persistence Layer

learning_interactions | learning_reviews |

prompt_evolution | autonomous_actions | budgets

Dependency Chain #

The cognitive features form a natural progression:

text

Reasoning (standalone -- no prerequisites)
    |
    v
Reflection (uses reasoning output as input for evaluation)
    |
    v
Learning (records reflection scores to SQLite)
    |
    v
Evolution (uses learning reviews to propose prompt changes)

Autonomy (standalone -- benefits from all above, but works independently)

Each layer can operate independently, but they compound when combined. A learning agent without reflection still records interactions, but the learning reviews are less informative without quality scores to analyze.

17.1 Reasoning Strategies #

Reasoning strategies add structured thinking to an agent's response process. Instead of generating a single-shot answer, the agent is instructed to think before answering -- breaking problems into steps, exploring multiple paths, or generating multiple independent answers.

Chain of Thought #

The simplest reasoning mode. The VM prepends chain-of-thought instructions to the system prompt, producing step-by-step reasoning before the final answer.

neam

agent Analyst {
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.3
  system: "You are a data analyst. Break down complex questions step by step."
  reasoning: chain_of_thought
}

{
  let answer = Analyst.ask("Why did revenue drop 15% in Q3?");
  emit answer;
}

What happens at runtime:

The VM prepends "Think step-by-step before answering..." to the system prompt.
The agent produces numbered reasoning steps followed by a conclusion.
The full response (with reasoning) is returned.

Cost: 1 LLM call (same as no reasoning, but the response is longer).

Plan and Execute #

Generates a multi-step plan, executes each step as a separate LLM call, then synthesizes results. This is the most thorough reasoning mode for complex, multi-faceted questions.

neam

agent ProjectPlanner {
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.4
  system: "You are a project planning expert."
  reasoning: plan_and_execute
}

{
  let plan = ProjectPlanner.ask(
    "Create a launch plan for a mobile app targeting college students."
  );
  emit plan;
}

What happens at runtime:

LLM call #1: Generate a numbered plan (e.g., 5 steps).
LLM calls #2 through #6: Execute each step individually.
LLM call #7: Synthesize all step results into a final answer.

Cost: N + 2 LLM calls, where N is the number of plan steps.

Tree of Thought #

Explores multiple reasoning branches, scores each, and selects the best path. Ideal for decisions where multiple viable options exist.

neam

agent Strategist {
  provider: "openai"
  model: "gpt-4o"
  system: "You are a strategic advisor. Evaluate all options carefully."
  reasoning: tree_of_thought
}

{
  let strategy = Strategist.ask(
    "Should we expand into the EU market or focus on APAC first?"
  );
  emit strategy;
  // The VM generates 3 branches, scores each, and returns the best one
}

What happens at runtime:

LLM call #1: Generate 3 distinct approaches to the problem.
LLM call #2: Score each approach on feasibility, impact, and risk.
Return the highest-scoring approach with its reasoning.

Cost: 2-3 LLM calls.

Self Consistency #

Generates multiple independent answers and returns the majority consensus. This is particularly effective for math, logic, and factual questions where there is a single correct answer.

neam

agent MathSolver {
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.7
  system: "You solve math problems."
  reasoning: self_consistency
}

{
  let answer = MathSolver.ask("What is the integral of x^2 * e^x dx?");
  emit answer;
  // Generates 3 independent solutions and picks the consensus
}

What happens at runtime:

Generate N independent answers (default N=3, with elevated temperature for diversity).
LLM call #N+1: Analyze all answers and select the majority consensus.
Return the consensus answer with confidence.

Cost: N + 1 LLM calls.

Reasoning Strategy Comparison #

Plan

(1 call)

3 Branches

(1 call)

N Answers

(N calls)

▶

Consensus

(1 call)

Mode	LLM Calls	Best For	Trade-off
`chain_of_thought`	1	General reasoning, step-by-step analysis	Minimal overhead
`plan_and_execute`	N+2	Complex multi-step tasks, project planning	High cost, thorough
`tree_of_thought`	2-3	Decisions with multiple viable options	Moderate cost
`self_consistency`	N+1	Math, logic, factual questions	High cost, high accuracy

Reasoning Configuration #

For finer control, add a reasoning_config block alongside the reasoning strategy:

neam

agent CarefulAnalyst {
  provider: "openai"
  model: "gpt-4o"
  system: "You are a meticulous analyst."
  reasoning: plan_and_execute

  reasoning_config: {
    max_steps: 5
    show_thinking: false
    verify_before_respond: true
  }
}

Field	Type	Default	Description
`max_steps`	int	`5`	Maximum reasoning steps (plan_and_execute, tree_of_thought)
`show_thinking`	bool	`false`	Include the reasoning trace in the returned output
`verify_before_respond`	bool	`true`	Add a final verification step before returning

When show_thinking is true, the response includes the full reasoning trace (numbered steps, branch scores, or consensus analysis) before the final answer. This is useful for debugging or for applications that want to display the agent's thought process to users.

When verify_before_respond is true, the agent performs a final check after reasoning to confirm the answer is consistent with the reasoning steps. This adds one extra LLM call but catches contradictions between the reasoning trace and the final answer.

17.2 Reflection #

Reflection enables an agent to evaluate the quality of its own output. After generating a response, the agent calls the LLM again to score the response across configurable quality dimensions. If the score falls below a threshold, the agent can automatically revise, retry, escalate, or acknowledge the low confidence.

Basic Reflection #

neam

agent QAAgent {
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.4
  system: "You answer technical questions accurately and concisely."
  reasoning: chain_of_thought

  reflect: {
    after: each_response
    evaluate: [accuracy, relevance, completeness]
    min_confidence: 0.7
    on_low_quality: {
      strategy: "revise"
      max_revisions: 2
    }
  }
}

{
  let answer = QAAgent.ask("Explain the CAP theorem and its implications.");
  emit answer;
  // If self-evaluation scores below 0.7, the agent automatically revises
}

What happens at runtime:

The agent generates a response (with reasoning if configured).
The VM builds an evaluation prompt: "Rate this response on accuracy, relevance, completeness (0.0-1.0)."
The LLM returns scores as JSON: {"accuracy": 0.85, "relevance": 0.9, "completeness": 0.6}
The VM computes the average score (0.78 in this example).
If the average is below min_confidence (0.7), the configured strategy triggers.

Low-Quality Strategies #

Strategy	Behavior
`"revise"`	Feed the scores back to the agent and ask it to improve (up to `max_revisions` times)
`"retry"`	Regenerate the response from scratch
`"escalate"`	Hand off to a more capable agent specified by `escalate_to`
`"acknowledge"`	Return the response with a confidence warning appended

Escalation Example #

Route low-quality responses to a more capable agent:

neam

agent JuniorAgent {
  provider: "openai"
  model: "gpt-4o-mini"
  system: "You answer general questions."

  reflect: {
    after: each_response
    evaluate: [accuracy, completeness]
    min_confidence: 0.8
    on_low_quality: {
      strategy: "escalate"
      escalate_to: "SeniorAgent"
    }
  }
}

agent SeniorAgent {
  provider: "openai"
  model: "gpt-4o"
  system: "You are a senior expert. Provide thorough, detailed answers."
}

{
  let answer = JuniorAgent.ask("Explain quantum entanglement.");
  emit answer;
  // If JuniorAgent scores below 0.8, SeniorAgent handles it automatically
}

On-Demand Reflection #

Trigger reflection manually from code using agent_reflect():

neam

{
  let answer = QAAgent.ask("What is RAFT consensus?");
  emit answer;

  // Manually trigger reflection
  let scores = agent_reflect("QAAgent");
  emit "Accuracy:     " + str(scores["accuracy"]);
  emit "Relevance:    " + str(scores["relevance"]);
  emit "Completeness: " + str(scores["completeness"]);
}

The function returns a map of dimension names to scores (0.0-1.0).

Explicit Feedback #

Provide external feedback scores (e.g., from a human reviewer or automated test) to influence the learning system:

neam

{
  let answer = QAAgent.ask("What is RAFT consensus?");
  emit answer;

  // Rate the response (0.0 to 1.0)
  agent_rate("QAAgent", 0.9);
}

Reflection Configuration Reference #

Field	Type	Default	Description
`after`	identifier	required	When to reflect: `each_response`, `every_n`, or `on_demand`
`evaluate`	list	required	Dimensions to score (free-form identifiers)
`min_confidence`	float	`0.7`	Minimum average score threshold (0.0-1.0)
`on_low_quality.strategy`	string	`"revise"`	`"revise"`, `"retry"`, `"escalate"`, `"acknowledge"`
`on_low_quality.max_revisions`	int	`2`	Maximum revision attempts
`on_low_quality.escalate_to`	string	--	Agent name for escalation

17.3 Learning Loop #

The learning loop enables agents to learn from their interaction history. The VM records every query/response pair to SQLite, periodically reviews accumulated interactions, and extracts patterns to improve future responses.

User Query

Enabling Learning #

neam

agent Tutor {
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.5
  system: "You are a patient programming tutor."
  reasoning: chain_of_thought

  reflect: {
    after: each_response
    evaluate: [clarity, accuracy]
    min_confidence: 0.7
    on_low_quality: { strategy: "revise", max_revisions: 1 }
  }

  learning: {
    strategy: "experience_replay"
    review_interval: 10
    max_adaptations: 50
    rollback_on_decline: true
  }

  memory: "tutor_memory"
}

What happens at runtime:

For every interaction (query + response):

The VM records to SQLite: query, response, reflection scores, feedback score, token count, and timestamp.
The interaction counter increments.

Every N interactions (where N = review_interval):

The VM loads the last N interactions from SQLite.
Builds a meta-prompt: "Review these interactions. Extract patterns, identify weaknesses, suggest improvements."
The LLM returns a learning review with lessons and a prompt addendum.
The review is stored in the learning_reviews table.
The prompt addendum is appended to the agent's system prompt for future calls.

Checking Learning Progress #

neam

{
  // Run several interactions
  let a1 = Tutor.ask("Explain recursion.");
  let a2 = Tutor.ask("What is a binary tree?");
  let a3 = Tutor.ask("How does quicksort work?");

  // Check learning statistics
  let stats = agent_learning_stats("Tutor");
  emit "Total interactions:       " + str(stats["total_interactions"]);
  emit "Average reflection score: " + str(stats["avg_reflection_score"]);
  emit "Reviews completed:        " + str(stats["reviews_completed"]);
}

Learning Strategies #

Strategy	Description	Best For
`experience_replay`	Review recent interactions, extract lessons from successes and failures	General-purpose agents
`pattern_extraction`	Identify recurring patterns in queries and optimize response templates	FAQ/support agents
`prompt_evolution`	Gradually evolve the system prompt based on learning reviews	Long-running agents
`preference_learning`	Learn from explicit feedback scores to adjust behavior	User-facing agents

Learning Configuration Reference #

Field	Type	Default	Description
`strategy`	string	required	Learning strategy (see table above)
`review_interval`	int	`10`	Trigger review every N interactions
`max_adaptations`	int	`50`	Maximum prompt adjustments over lifetime
`rollback_on_decline`	bool	`true`	Revert if performance declines after adaptation

Important Notes #

Learning recording is asynchronous (non-blocking) -- it adds less than 1ms overhead to each interaction.
A memory store must be configured for learning data to persist across restarts.
Without memory, learning data is held in-memory only and lost on exit.
The learning feature works without reflection, but reflection scores significantly improve learning quality.

17.4 Prompt Evolution #

Prompt evolution enables agents to rewrite their own system prompt over time based on accumulated learning data. The evolved prompt is validated against a declared core identity (ensuring the agent does not drift from its intended purpose) and supports version rollback.

Enabling Evolution #

neam

agent SalesBot {
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.6
  system: "You are a friendly sales assistant. Help customers find the right product."

  reasoning: chain_of_thought

  reflect: {
    after: each_response
    evaluate: [helpfulness, accuracy, tone]
    min_confidence: 0.7
    on_low_quality: { strategy: "revise", max_revisions: 2 }
  }

  learning: {
    strategy: "prompt_evolution"
    review_interval: 10
    max_adaptations: 30
    rollback_on_decline: true
  }

  evolve: {
    mutable: [system_prompt, temperature]
    review_after: 50
    core_identity: "You are a friendly sales assistant."
    allow_rollback: true
  }

  memory: "sales_memory"
}

What happens at runtime:

After review_after interactions (50 in this example):

The VM loads all learning reviews from SQLite.
Builds a meta-prompt: "Based on these reviews, propose an improved system prompt. The following text MUST appear verbatim: 'You are a friendly sales assistant.'"
The LLM proposes a new prompt with reasoning.
The VM validates that the core_identity text is preserved.
The new prompt is stored in prompt_evolution with an incremented version number.
All subsequent calls use the evolved prompt.

Checking Evolution Status #

neam

{
  let status = agent_status("SalesBot");
  emit "Evolution version:   " + str(status["evolution_version"]);
  emit "Current prompt:      " + str(status["evolved_prompt"]);
  emit "Reasoning mode:      " + str(status["reasoning_mode"]);
  emit "Learning count:      " + str(status["learning_count"]);
}

Manual Evolution and Rollback #

neam

{
  // Force an evolution cycle now (regardless of review_after threshold)
  agent_evolve("SalesBot");

  // View all prompt versions
  let history = agent_prompt_history("SalesBot");
  emit "Total versions: " + str(len(history));

  // Inspect each version
  for (version, prompt) in enumerate(history) {
    emit "Version " + str(version) + ": " + str(prompt);
  }

  // Rollback to version 1 if the latest evolution is not performing well
  agent_rollback("SalesBot", 1);
  emit "Rolled back to version 1.";
}

Evolution Configuration Reference #

Field	Type	Default	Description
`mutable`	list	required	Fields that can evolve: `system_prompt`, `temperature`
`review_after`	int	`50`	Trigger evolution after N total interactions
`core_identity`	string	--	Text that must always appear verbatim in the evolved prompt
`allow_rollback`	bool	`true`	Enable rollback to previous versions

Important Notes #

Evolution requires learning to be enabled (it uses learning review data as input).
The core_identity string is enforced at evolution time -- the proposed prompt is rejected if it does not contain the string verbatim.
Evolved prompts are persisted in SQLite and survive restarts.
If rollback_on_decline is true in the learning config, the VM automatically reverts if average scores drop after evolution.

17.5 Autonomy and Goals #

Autonomy transforms an agent from a passive tool (responds only when called) into an active participant that pursues goals on a schedule, within defined resource budgets.

Basic Autonomous Agent #

neam

agent Monitor {
  provider: "openai"
  model: "gpt-4o-mini"
  temperature: 0.3
  system: "You monitor system health. Report anomalies and suggest fixes."

  reasoning: chain_of_thought

  reflect: {
    after: each_response
    evaluate: [accuracy, relevance]
    min_confidence: 0.6
    on_low_quality: { strategy: "acknowledge" }
  }

  learning: {
    strategy: "experience_replay"
    review_interval: 20
  }

  goals: [
    "Check system metrics and identify anomalies",
    "Generate daily health reports",
    "Escalate critical issues immediately"
  ]

  triggers: {
    on_schedule: "every 5m"
  }

  initiative: true

  budget: {
    max_daily_calls: 100
    max_daily_cost: 5.0
    max_daily_tokens: 50000
  }

  memory: "monitor_memory"
}

{
  emit "Monitor agent registered. Running every 5 minutes.";
  // The agent starts executing autonomously in the background
}

What happens at runtime:

On OP_DEFINE_AGENT, the VM registers the agent with the AutonomousExecutor.
A background thread checks the schedule every second.
When the schedule fires (every 5 minutes), the VM:
Checks daily budget limits.
Constructs a query from the agent's goals: "You have these goals: [goals]. Take autonomous action."
Calls the agent internally with the constructed query.
Logs the action and token usage to autonomous_actions and autonomous_budgets tables.
Budget counters reset daily at midnight.

Schedule Expressions #

Expression	Interval
`"every 30s"`	Every 30 seconds
`"every 5m"`	Every 5 minutes
`"every 1h"`	Every 1 hour
`"every 1d"`	Every 1 day

Managing Goals at Runtime #

neam

{
  // Read current goals
  let goals = agent_get_goals("Monitor");
  emit "Current goals: " + str(goals);

  // Update goals dynamically
  agent_set_goals("Monitor", [
    "Check system metrics and identify anomalies",
    "Generate hourly health reports",
    "Escalate critical issues to ops team",
    "Track API response times"
  ]);
  emit "Goals updated.";
}

Pause and Resume #

neam

{
  // Pause during a maintenance window
  agent_pause("Monitor");
  emit "Monitor paused.";

  // ... perform maintenance ...

  // Resume autonomous execution
  agent_resume("Monitor");
  emit "Monitor resumed.";
}

Budget Limits #

Field	Type	Default	Description
`budget.max_daily_calls`	int	`100`	Maximum LLM calls per day
`budget.max_daily_cost`	float	`5.0`	Maximum cost in USD per day
`budget.max_daily_tokens`	int	`50000`	Maximum tokens consumed per day

When any limit is reached, autonomous execution pauses until the next daily reset. Interactive calls (.ask()) are not affected by autonomous budgets.

17.6 Embedded Inference #

Embedded inference enables agents to run models locally within the Neam process using ONNX runtime, without requiring an external service like Ollama.

neam

agent LocalClassifier {
  provider: "local"
  model_path: "./models/classifier.onnx"
  system: "Classify input text into categories."
}

This feature is gated behind a compile-time flag and requires building Neam with ONNX support. It is primarily useful for lightweight classification, embedding, or preprocessing tasks where the overhead of an external API call is undesirable.

📝 Note

Embedded inference is an advanced feature intended for edge deployment scenarios. For most use cases, Ollama provides a simpler path to local model execution.

17.7 Cognitive Native Functions Reference #

Neam provides eleven native functions for interacting with cognitive features programmatically:

Function	Arity	Returns	Description
`agent_rate(agent, score)`	2	nil	Submit feedback score (0.0-1.0) for the last response
`agent_reflect(agent)`	1	map	Trigger on-demand reflection; returns `{"dimension": score}`
`agent_evolve(agent)`	1	nil	Trigger manual prompt evolution cycle
`agent_rollback(agent, version)`	2	nil	Rollback prompt to a specific version number
`agent_status(agent)`	1	map	Full cognitive state: reasoning_mode, learning_count, evolution_version, goals, evolved_prompt
`agent_learning_stats(agent)`	1	map	Learning statistics: total_interactions, avg_reflection_score, reviews_completed
`agent_prompt_history(agent)`	1	list	List of all evolved prompt strings (index = version)
`agent_get_goals(agent)`	1	list	Get current goals list
`agent_set_goals(agent, goals)`	2	nil	Update goals at runtime
`agent_pause(agent)`	1	nil	Pause autonomous execution
`agent_resume(agent)`	1	nil	Resume autonomous execution

Usage Patterns #

neam

{
  // === Feedback Loop ===
  let answer = MyAgent.ask("question");
  agent_rate("MyAgent", 0.8);

  // === Inspect State ===
  let status = agent_status("MyAgent");
  let stats = agent_learning_stats("MyAgent");
  let history = agent_prompt_history("MyAgent");
  let reflection = agent_reflect("MyAgent");

  // === Control Evolution ===
  agent_evolve("MyAgent");
  agent_rollback("MyAgent", 0);

  // === Control Autonomy ===
  agent_set_goals("MyAgent", ["new goal"]);
  agent_pause("MyAgent");
  agent_resume("MyAgent");
}

17.8 SQLite Persistence #

All cognitive data is persisted to SQLite when a memory store is configured on the agent. The following tables are created automatically:

Table	Key Columns	Purpose
`learning_interactions`	agent_name, query, response, reflection_score, feedback_score, tokens_used, timestamp	Every query/response pair with scores
`learning_reviews`	agent_name, strategy, interactions_reviewed, avg_reflection_score, lessons_json, prompt_addendum, timestamp	Periodic review results
`prompt_evolution`	agent_name, version, original_prompt, evolved_prompt, reasoning, status, timestamp	Versioned prompt history
`autonomous_actions`	agent_name, trigger_type, action_taken, tokens_used, timestamp	Log of autonomous agent actions
`autonomous_budgets`	agent_name, date, calls_used, tokens_used, cost_used	Daily budget tracking

Data Lifecycle #

text

Interaction --> learning_interactions (every call)
                      |
                      v (every review_interval calls)
                learning_reviews
                      |
                      v (every review_after calls)
                prompt_evolution
                      |
                      v
                Agent uses evolved prompt for all future calls

Data survives across program restarts, enabling agents to resume learning from where they left off.

Querying Data Externally #

The SQLite database is a standard .db file. You can query it directly with the sqlite3 command-line tool:

bash

# View all evolved prompts
sqlite3 ~/.neam/memory.db \
  "SELECT agent_name, version, evolved_prompt FROM prompt_evolution ORDER BY version;"

# View learning statistics
sqlite3 ~/.neam/memory.db \
  "SELECT agent_name, COUNT(*) as interactions, AVG(reflection_score) as avg_score
   FROM learning_interactions GROUP BY agent_name;"

# View budget usage
sqlite3 ~/.neam/memory.db \
  "SELECT * FROM autonomous_budgets WHERE date = date('now');"

17.9 Memory Systems #

The cognitive architecture so far stores learning data in flat SQLite tables. For agents that need richer memory capabilities, the standard library (std.agents.advanced.memory) provides four specialized memory systems that mirror human cognitive memory types.

Semantic Memory #

Semantic memory stores factual knowledge as vector-embedded entries organized by category. It supports similarity-based retrieval, making it ideal for building agents that accumulate domain expertise over time.

neam

import std::agents::advanced::memory::semantic;

let mem = semantic::create_memory({ "embedding_model": "nomic-embed-text" });
mem = semantic::store_fact(mem, "Neam compiles to bytecode", "language");
mem = semantic::store_fact(mem, "Agents connect to LLM providers", "architecture");

let related = semantic::retrieve_similar(mem, "How does Neam execute code?", 3);
let by_topic = semantic::retrieve_by_category(mem, "language");

Episodic Memory #

Episodic memory records events and sessions -- timestamped sequences of what happened during an agent's operation. It supports timeline queries and content-based search using embeddings.

neam

import std::agents::advanced::memory::episodic;

let ep = episodic::create_memory();
let session_id = episodic::start_episode(ep, "support_ticket_42");

ep = episodic::add_event(ep, session_id, "user_greeting", {
  "message": "Hello, I need help with billing."
});
ep = episodic::add_event(ep, session_id, "agent_response", {
  "message": "I can help with that. What is your account number?"
});
ep = episodic::end_episode(ep, session_id);

// Search past episodes by content similarity
let similar = episodic::search_episodes(ep, "billing question", 5);

Episodic memory also supports consolidation -- a process that prunes old, low- importance episodes to keep memory usage bounded:

neam

// Consolidate episodes older than 30 days with importance below 0.3
ep = episodic::consolidate(ep, { "max_age_days": 30, "min_importance": 0.3 });

Working Memory #

Working memory manages the current conversation context with a fixed capacity. It tracks what the agent is currently "focused on" and automatically evicts the least relevant items when capacity is reached.

neam

import std::agents::advanced::memory::working;

let wm = working::create_memory({ "capacity": 10 });
wm = working::add_item(wm, "user_name", "Alice");
wm = working::add_item(wm, "current_topic", "billing dispute");
wm = working::set_focus(wm, "billing dispute");

// Get items most relevant to the current query
let context = working::get_relevant(wm, "What is Alice's refund status?", 5);

// Export working memory as a text block for prompt injection
let context_text = working::to_text(wm);

Knowledge Graph Memory #

The knowledge graph module stores entities and relationships, enabling agents to reason about connections between concepts:

neam

import std::agents::advanced::memory::graph;

let kg = graph::create_graph();
kg = graph::add_node(kg, "Alice", "Customer", { "plan": "premium" });
kg = graph::add_node(kg, "Order-123", "Order", { "amount": 49.99 });
kg = graph::add_edge(kg, "Alice", "Order-123", "placed");

// Query: find all orders placed by Alice
let orders = graph::query(kg, { "from": "Alice", "relation": "placed" });

Combining Memory Systems #

A production agent might use all four memory types together:

Memory Type	Stores	Retrieval	Persistence
Semantic	Facts, domain knowledge	Similarity search by embedding	Long-term (SQLite + vectors)
Episodic	Events, conversation sessions	Timeline or content search	Long-term (SQLite)
Working	Current context, active focus	Relevance to current query	Session-scoped (in-memory)
Knowledge graph	Entities, relationships	Graph traversal, pattern matching	Long-term (SQLite)

📝 Note

The memory field on an agent declaration (e.g., memory: "tutor_memory") configures the SQLite store used by the learning loop and prompt evolution. The std.agents.advanced.memory modules provide additional memory capabilities that you manage explicitly in your program logic.

17.10 Full Cognitive Agent Example #

This example combines all cognitive features into a single production-ready agent:

neam

agent ResearchAssistant {
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.5
  system: "You are an AI research assistant. You analyze papers, summarize findings,
           and track research trends across multiple domains."

  // Step 1: Structured thinking -- plans before answering
  reasoning: plan_and_execute

  // Step 2: Self-evaluation after every response
  reflect: {
    after: each_response
    evaluate: [accuracy, completeness, clarity, relevance]
    min_confidence: 0.75
    on_low_quality: {
      strategy: "revise"
      max_revisions: 2
    }
  }

  // Step 3: Learn from interaction history
  learning: {
    strategy: "experience_replay"
    review_interval: 15
    max_adaptations: 100
    rollback_on_decline: true
  }

  // Step 4: Evolve prompt over time
  evolve: {
    mutable: [system_prompt]
    review_after: 100
    core_identity: "You are an AI research assistant."
    allow_rollback: true
  }

  // Step 5: Autonomous daily research
  goals: [
    "Track new papers in AI safety and alignment",
    "Produce weekly research digests"
  ]
  triggers: {
    on_schedule: "every 1d"
  }
  initiative: true
  budget: {
    max_daily_calls: 50
    max_daily_cost: 10.0
  }

  memory: "research_memory"
}

{
  // Interactive usage
  let summary = ResearchAssistant.ask(
    "Summarize the latest developments in RLHF and DPO for LLM alignment."
  );
  emit summary;
  // The VM: plans -> executes steps -> reflects -> records interaction -> returns

  // Provide feedback
  agent_rate("ResearchAssistant", 0.95);

  // Check cognitive status
  let status = agent_status("ResearchAssistant");
  emit "Reasoning mode:      " + str(status["reasoning_mode"]);
  emit "Learning count:      " + str(status["learning_count"]);
  emit "Evolution version:   " + str(status["evolution_version"]);
  emit "Goals:               " + str(status["goals"]);

  // Check learning statistics
  let stats = agent_learning_stats("ResearchAssistant");
  emit "Avg reflection score: " + str(stats["avg_reflection_score"]);

  // View prompt evolution history
  let history = agent_prompt_history("ResearchAssistant");
  emit "Prompt versions: " + str(len(history));
}

Expected Output (After Several Interactions) #

text

[Plan and Execute: generating plan...]
[Step 1: Survey recent RLHF papers...]
[Step 2: Survey DPO developments...]
[Step 3: Compare approaches...]
[Synthesizing final answer...]

RLHF (Reinforcement Learning from Human Feedback) continues to evolve with...
[detailed multi-paragraph response]

Reasoning mode:      plan_and_execute
Learning count:      47
Evolution version:   2
Goals:               ["Track new papers in AI safety and alignment", "Produce weekly research digests"]
Avg reflection score: 0.82
Prompt versions: 2

17.11 Migration from v0.4.x #

Zero Breaking Changes #

v0.5.0 is fully backward compatible with v0.4.x:

All new properties are optional.
Agents without cognitive properties behave identically to v0.4.x.
No changes to existing syntax, opcodes, or runtime behavior.
Existing .neamb bytecode continues to work (new opcodes are additive).

Incremental Adoption Path #

Start with reasoning, the simplest and cheapest cognitive feature:

neam

// v0.4.x agent -- still works unchanged
agent Helper {
  provider: "openai"
  model: "gpt-4o"
  system: "You are helpful."
}

// v0.5.0 -- add reasoning (one line change)
agent SmartHelper {
  provider: "openai"
  model: "gpt-4o"
  system: "You are helpful."
  reasoning: chain_of_thought
}

Then add reflection:

neam

agent ReflectiveHelper {
  provider: "openai"
  model: "gpt-4o"
  system: "You are helpful."
  reasoning: chain_of_thought
  reflect: {
    after: each_response
    evaluate: [accuracy, helpfulness]
    min_confidence: 0.7
    on_low_quality: { strategy: "revise" }
  }
}

Then learning and evolution:

neam

agent EvolvingHelper {
  provider: "openai"
  model: "gpt-4o"
  system: "You are helpful."
  reasoning: chain_of_thought
  reflect: {
    after: each_response
    evaluate: [accuracy, helpfulness]
    min_confidence: 0.7
    on_low_quality: { strategy: "revise" }
  }
  learning: {
    strategy: "experience_replay"
    review_interval: 10
  }
  evolve: {
    mutable: [system_prompt]
    review_after: 50
    core_identity: "You are helpful."
    allow_rollback: true
  }
  memory: "helper_memory"
}

Summary #

In this chapter you learned:

Reasoning strategies: Four modes of structured thinking -- chain_of_thought, plan_and_execute, tree_of_thought, and self_consistency -- each with different cost/quality trade-offs.
Reasoning configuration: Fine-tuning with max_steps, show_thinking, and verify_before_respond for controlling reasoning behavior.
Reflection: Self-evaluation with configurable quality dimensions, thresholds, and strategies for handling low-quality output (revise, retry, escalate, acknowledge).
Learning loops: Recording interaction history to SQLite, periodic learning reviews, and four learning strategies (experience replay, pattern extraction, prompt evolution, preference learning).
Prompt evolution: Automatic system prompt refinement with core identity preservation and version rollback.
Autonomy: Goal-driven agents that act on schedules with daily budget limits for calls, cost, and tokens.
Embedded inference: Local ONNX model execution for edge deployment.
Native functions: Eleven functions for programmatic control of cognitive features.
SQLite persistence: How cognitive data is stored and how to query it externally.
Memory systems: Four specialized memory types from the standard library -- semantic (factual knowledge with vector retrieval), episodic (event and session tracking), working (current context with capacity limits), and knowledge graph (entities and relationships).
Incremental adoption: How to migrate from v0.4.x by adding one feature at a time.

Exercises #

Exercise 17.1: Reasoning Strategy Comparison #

Create four agents, each using a different reasoning strategy (chain_of_thought, plan_and_execute, tree_of_thought, self_consistency). Ask each agent the same complex question (e.g., "Should a startup prioritize growth or profitability in its first two years?"). Compare the responses in terms of quality, depth, and cost (number of LLM calls). Which strategy produced the best answer? Which was the most cost-effective?

Exercise 17.2: Quality Gate with Reflection #

Build an agent with reflection enabled. Set min_confidence to 0.9 (deliberately high). Ask the agent a moderately difficult question and observe:

What reflection scores does it produce?
How many revision rounds does it go through?
Is the final answer better than the first attempt?

Repeat with min_confidence set to 0.5 and compare the behavior.

Exercise 17.3: Learning Over Time #

Create a tutoring agent with learning enabled (review_interval: 5). Ask it 10 questions about a subject you know well. After each answer, use agent_rate() to provide a feedback score. After the 10th question, check agent_learning_stats(). Has the average reflection score improved over the session? What lessons did the learning review extract?

Exercise 17.4: Prompt Evolution Experiment #

Create an agent with all cognitive features enabled and review_after: 10 (low threshold for testing). Run 15 interactions with consistent feedback. Then:

Check the prompt history with agent_prompt_history().
Compare the original and evolved prompts.
Rollback to version 0 using agent_rollback().
Verify the agent uses the original prompt.

Document how the evolved prompt differs from the original.

Exercise 17.5: Autonomous Monitor #

Build an autonomous agent with: - Goals related to a task you define (e.g., "Check weather and report temperature"). - A schedule of "every 30s" (for quick testing). - A budget of max_daily_calls: 5.

Run the program and observe: 1. How often does the agent execute? 2. What happens when the budget limit is reached? 3. Use agent_pause() and agent_resume() to control execution.

Exercise 17.6: Full Cognitive Pipeline #

Design a customer support agent that combines all six cognitive capabilities. The agent should:

Use plan_and_execute reasoning for complex tickets.
Reflect on every response with accuracy and tone evaluation.
Learn from interaction history with experience_replay.
Evolve its prompt, preserving the core identity: "You are a customer support agent."
Run autonomously every hour to generate a summary of recent tickets.
Operate within a daily budget of 200 calls and $20.

Write the complete Neam program, including at least 5 test interactions and cognitive status checks after each one.