Programming Neam
📖 13 min read

Chapter 17: Cognitive Features #

"An agent that cannot reflect on its own output is merely an API wrapper."

Up to this point in the book, every agent we have built has been stateless and reactive: it receives a prompt, calls an LLM, and returns the response. It does not evaluate the quality of its own output, learn from past interactions, improve its prompts over time, or take action on its own initiative. In production, this is a serious limitation.

Neam v0.5.0 introduces a cognitive architecture -- a set of six opt-in capabilities that transform reactive agents into reflective, learning, evolving, autonomous systems. These capabilities build on each other in a natural progression:

  1. Reasoning -- structured thinking strategies before answering.
  2. Reflection -- self-evaluation of output quality.
  3. Learning -- recording and reviewing interaction history.
  4. Prompt Evolution -- automatic refinement of system prompts.
  5. Autonomy -- goal-driven behavior on schedules with budgets.
  6. Embedded Inference -- local model execution (ONNX).

Each feature is fully opt-in. An agent without cognitive properties behaves identically to v0.4.x. You can adopt these capabilities incrementally, starting with reasoning and adding more as your requirements grow.

Reasoning
Strategy
Reflection
Engine
Learning
Loop
Evolve
Engine
LLM Provider (OpenAI / Ollama / ...)
SQLite Persistence Layer
learning_interactions | learning_reviews |
prompt_evolution | autonomous_actions | budgets

Dependency Chain #

The cognitive features form a natural progression:

text
Reasoning (standalone -- no prerequisites)
    |
    v
Reflection (uses reasoning output as input for evaluation)
    |
    v
Learning (records reflection scores to SQLite)
    |
    v
Evolution (uses learning reviews to propose prompt changes)

Autonomy (standalone -- benefits from all above, but works independently)

Each layer can operate independently, but they compound when combined. A learning agent without reflection still records interactions, but the learning reviews are less informative without quality scores to analyze.


17.1 Reasoning Strategies #

Reasoning strategies add structured thinking to an agent's response process. Instead of generating a single-shot answer, the agent is instructed to think before answering -- breaking problems into steps, exploring multiple paths, or generating multiple independent answers.

Chain of Thought #

The simplest reasoning mode. The VM prepends chain-of-thought instructions to the system prompt, producing step-by-step reasoning before the final answer.

neam
agent Analyst {
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.3
  system: "You are a data analyst. Break down complex questions step by step."
  reasoning: chain_of_thought
}

{
  let answer = Analyst.ask("Why did revenue drop 15% in Q3?");
  emit answer;
}

What happens at runtime:

  1. The VM prepends "Think step-by-step before answering..." to the system prompt.
  2. The agent produces numbered reasoning steps followed by a conclusion.
  3. The full response (with reasoning) is returned.

Cost: 1 LLM call (same as no reasoning, but the response is longer).

Plan and Execute #

Generates a multi-step plan, executes each step as a separate LLM call, then synthesizes results. This is the most thorough reasoning mode for complex, multi-faceted questions.

neam
agent ProjectPlanner {
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.4
  system: "You are a project planning expert."
  reasoning: plan_and_execute
}

{
  let plan = ProjectPlanner.ask(
    "Create a launch plan for a mobile app targeting college students."
  );
  emit plan;
}

What happens at runtime:

  1. LLM call #1: Generate a numbered plan (e.g., 5 steps).
  2. LLM calls #2 through #6: Execute each step individually.
  3. LLM call #7: Synthesize all step results into a final answer.

Cost: N + 2 LLM calls, where N is the number of plan steps.

Tree of Thought #

Explores multiple reasoning branches, scores each, and selects the best path. Ideal for decisions where multiple viable options exist.

neam
agent Strategist {
  provider: "openai"
  model: "gpt-4o"
  system: "You are a strategic advisor. Evaluate all options carefully."
  reasoning: tree_of_thought
}

{
  let strategy = Strategist.ask(
    "Should we expand into the EU market or focus on APAC first?"
  );
  emit strategy;
  // The VM generates 3 branches, scores each, and returns the best one
}

What happens at runtime:

  1. LLM call #1: Generate 3 distinct approaches to the problem.
  2. LLM call #2: Score each approach on feasibility, impact, and risk.
  3. Return the highest-scoring approach with its reasoning.

Cost: 2-3 LLM calls.

Self Consistency #

Generates multiple independent answers and returns the majority consensus. This is particularly effective for math, logic, and factual questions where there is a single correct answer.

neam
agent MathSolver {
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.7
  system: "You solve math problems."
  reasoning: self_consistency
}

{
  let answer = MathSolver.ask("What is the integral of x^2 * e^x dx?");
  emit answer;
  // Generates 3 independent solutions and picks the consensus
}

What happens at runtime:

  1. Generate N independent answers (default N=3, with elevated temperature for diversity).
  2. LLM call #N+1: Analyze all answers and select the majority consensus.
  3. Return the consensus answer with confidence.

Cost: N + 1 LLM calls.

Reasoning Strategy Comparison #

Plan
(1 call)
3 Branches
(1 call)
N Answers
(N calls)
Consensus
(1 call)
Mode LLM Calls Best For Trade-off
chain_of_thought 1 General reasoning, step-by-step analysis Minimal overhead
plan_and_execute N+2 Complex multi-step tasks, project planning High cost, thorough
tree_of_thought 2-3 Decisions with multiple viable options Moderate cost
self_consistency N+1 Math, logic, factual questions High cost, high accuracy

Reasoning Configuration #

For finer control, add a reasoning_config block alongside the reasoning strategy:

neam
agent CarefulAnalyst {
  provider: "openai"
  model: "gpt-4o"
  system: "You are a meticulous analyst."
  reasoning: plan_and_execute

  reasoning_config: {
    max_steps: 5
    show_thinking: false
    verify_before_respond: true
  }
}
Field Type Default Description
max_steps int 5 Maximum reasoning steps (plan_and_execute, tree_of_thought)
show_thinking bool false Include the reasoning trace in the returned output
verify_before_respond bool true Add a final verification step before returning

When show_thinking is true, the response includes the full reasoning trace (numbered steps, branch scores, or consensus analysis) before the final answer. This is useful for debugging or for applications that want to display the agent's thought process to users.

When verify_before_respond is true, the agent performs a final check after reasoning to confirm the answer is consistent with the reasoning steps. This adds one extra LLM call but catches contradictions between the reasoning trace and the final answer.


17.2 Reflection #

Reflection enables an agent to evaluate the quality of its own output. After generating a response, the agent calls the LLM again to score the response across configurable quality dimensions. If the score falls below a threshold, the agent can automatically revise, retry, escalate, or acknowledge the low confidence.

Basic Reflection #

neam
agent QAAgent {
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.4
  system: "You answer technical questions accurately and concisely."
  reasoning: chain_of_thought

  reflect: {
    after: each_response
    evaluate: [accuracy, relevance, completeness]
    min_confidence: 0.7
    on_low_quality: {
      strategy: "revise"
      max_revisions: 2
    }
  }
}

{
  let answer = QAAgent.ask("Explain the CAP theorem and its implications.");
  emit answer;
  // If self-evaluation scores below 0.7, the agent automatically revises
}

What happens at runtime:

  1. The agent generates a response (with reasoning if configured).
  2. The VM builds an evaluation prompt: "Rate this response on accuracy, relevance, completeness (0.0-1.0)."
  3. The LLM returns scores as JSON: {"accuracy": 0.85, "relevance": 0.9, "completeness": 0.6}
  4. The VM computes the average score (0.78 in this example).
  5. If the average is below min_confidence (0.7), the configured strategy triggers.

Low-Quality Strategies #

Strategy Behavior
"revise" Feed the scores back to the agent and ask it to improve (up to max_revisions times)
"retry" Regenerate the response from scratch
"escalate" Hand off to a more capable agent specified by escalate_to
"acknowledge" Return the response with a confidence warning appended

Escalation Example #

Route low-quality responses to a more capable agent:

neam
agent JuniorAgent {
  provider: "openai"
  model: "gpt-4o-mini"
  system: "You answer general questions."

  reflect: {
    after: each_response
    evaluate: [accuracy, completeness]
    min_confidence: 0.8
    on_low_quality: {
      strategy: "escalate"
      escalate_to: "SeniorAgent"
    }
  }
}

agent SeniorAgent {
  provider: "openai"
  model: "gpt-4o"
  system: "You are a senior expert. Provide thorough, detailed answers."
}

{
  let answer = JuniorAgent.ask("Explain quantum entanglement.");
  emit answer;
  // If JuniorAgent scores below 0.8, SeniorAgent handles it automatically
}

On-Demand Reflection #

Trigger reflection manually from code using agent_reflect():

neam
{
  let answer = QAAgent.ask("What is RAFT consensus?");
  emit answer;

  // Manually trigger reflection
  let scores = agent_reflect("QAAgent");
  emit "Accuracy:     " + str(scores["accuracy"]);
  emit "Relevance:    " + str(scores["relevance"]);
  emit "Completeness: " + str(scores["completeness"]);
}

The function returns a map of dimension names to scores (0.0-1.0).

Explicit Feedback #

Provide external feedback scores (e.g., from a human reviewer or automated test) to influence the learning system:

neam
{
  let answer = QAAgent.ask("What is RAFT consensus?");
  emit answer;

  // Rate the response (0.0 to 1.0)
  agent_rate("QAAgent", 0.9);
}

Reflection Configuration Reference #

Field Type Default Description
after identifier required When to reflect: each_response, every_n, or on_demand
evaluate list required Dimensions to score (free-form identifiers)
min_confidence float 0.7 Minimum average score threshold (0.0-1.0)
on_low_quality.strategy string "revise" "revise", "retry", "escalate", "acknowledge"
on_low_quality.max_revisions int 2 Maximum revision attempts
on_low_quality.escalate_to string -- Agent name for escalation

17.3 Learning Loop #

The learning loop enables agents to learn from their interaction history. The VM records every query/response pair to SQLite, periodically reviews accumulated interactions, and extracts patterns to improve future responses.

User Query

Enabling Learning #

neam
agent Tutor {
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.5
  system: "You are a patient programming tutor."
  reasoning: chain_of_thought

  reflect: {
    after: each_response
    evaluate: [clarity, accuracy]
    min_confidence: 0.7
    on_low_quality: { strategy: "revise", max_revisions: 1 }
  }

  learning: {
    strategy: "experience_replay"
    review_interval: 10
    max_adaptations: 50
    rollback_on_decline: true
  }

  memory: "tutor_memory"
}

What happens at runtime:

For every interaction (query + response):

  1. The VM records to SQLite: query, response, reflection scores, feedback score, token count, and timestamp.
  2. The interaction counter increments.

Every N interactions (where N = review_interval):

  1. The VM loads the last N interactions from SQLite.
  2. Builds a meta-prompt: "Review these interactions. Extract patterns, identify weaknesses, suggest improvements."
  3. The LLM returns a learning review with lessons and a prompt addendum.
  4. The review is stored in the learning_reviews table.
  5. The prompt addendum is appended to the agent's system prompt for future calls.

Checking Learning Progress #

neam
{
  // Run several interactions
  let a1 = Tutor.ask("Explain recursion.");
  let a2 = Tutor.ask("What is a binary tree?");
  let a3 = Tutor.ask("How does quicksort work?");

  // Check learning statistics
  let stats = agent_learning_stats("Tutor");
  emit "Total interactions:       " + str(stats["total_interactions"]);
  emit "Average reflection score: " + str(stats["avg_reflection_score"]);
  emit "Reviews completed:        " + str(stats["reviews_completed"]);
}

Learning Strategies #

Strategy Description Best For
experience_replay Review recent interactions, extract lessons from successes and failures General-purpose agents
pattern_extraction Identify recurring patterns in queries and optimize response templates FAQ/support agents
prompt_evolution Gradually evolve the system prompt based on learning reviews Long-running agents
preference_learning Learn from explicit feedback scores to adjust behavior User-facing agents

Learning Configuration Reference #

Field Type Default Description
strategy string required Learning strategy (see table above)
review_interval int 10 Trigger review every N interactions
max_adaptations int 50 Maximum prompt adjustments over lifetime
rollback_on_decline bool true Revert if performance declines after adaptation

Important Notes #


17.4 Prompt Evolution #

Prompt evolution enables agents to rewrite their own system prompt over time based on accumulated learning data. The evolved prompt is validated against a declared core identity (ensuring the agent does not drift from its intended purpose) and supports version rollback.

Enabling Evolution #

neam
agent SalesBot {
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.6
  system: "You are a friendly sales assistant. Help customers find the right product."

  reasoning: chain_of_thought

  reflect: {
    after: each_response
    evaluate: [helpfulness, accuracy, tone]
    min_confidence: 0.7
    on_low_quality: { strategy: "revise", max_revisions: 2 }
  }

  learning: {
    strategy: "prompt_evolution"
    review_interval: 10
    max_adaptations: 30
    rollback_on_decline: true
  }

  evolve: {
    mutable: [system_prompt, temperature]
    review_after: 50
    core_identity: "You are a friendly sales assistant."
    allow_rollback: true
  }

  memory: "sales_memory"
}

What happens at runtime:

After review_after interactions (50 in this example):

  1. The VM loads all learning reviews from SQLite.
  2. Builds a meta-prompt: "Based on these reviews, propose an improved system prompt. The following text MUST appear verbatim: 'You are a friendly sales assistant.'"
  3. The LLM proposes a new prompt with reasoning.
  4. The VM validates that the core_identity text is preserved.
  5. The new prompt is stored in prompt_evolution with an incremented version number.
  6. All subsequent calls use the evolved prompt.

Checking Evolution Status #

neam
{
  let status = agent_status("SalesBot");
  emit "Evolution version:   " + str(status["evolution_version"]);
  emit "Current prompt:      " + str(status["evolved_prompt"]);
  emit "Reasoning mode:      " + str(status["reasoning_mode"]);
  emit "Learning count:      " + str(status["learning_count"]);
}

Manual Evolution and Rollback #

neam
{
  // Force an evolution cycle now (regardless of review_after threshold)
  agent_evolve("SalesBot");

  // View all prompt versions
  let history = agent_prompt_history("SalesBot");
  emit "Total versions: " + str(len(history));

  // Inspect each version
  for (version, prompt) in enumerate(history) {
    emit "Version " + str(version) + ": " + str(prompt);
  }

  // Rollback to version 1 if the latest evolution is not performing well
  agent_rollback("SalesBot", 1);
  emit "Rolled back to version 1.";
}

Evolution Configuration Reference #

Field Type Default Description
mutable list required Fields that can evolve: system_prompt, temperature
review_after int 50 Trigger evolution after N total interactions
core_identity string -- Text that must always appear verbatim in the evolved prompt
allow_rollback bool true Enable rollback to previous versions

Important Notes #


17.5 Autonomy and Goals #

Autonomy transforms an agent from a passive tool (responds only when called) into an active participant that pursues goals on a schedule, within defined resource budgets.

Basic Autonomous Agent #

neam
agent Monitor {
  provider: "openai"
  model: "gpt-4o-mini"
  temperature: 0.3
  system: "You monitor system health. Report anomalies and suggest fixes."

  reasoning: chain_of_thought

  reflect: {
    after: each_response
    evaluate: [accuracy, relevance]
    min_confidence: 0.6
    on_low_quality: { strategy: "acknowledge" }
  }

  learning: {
    strategy: "experience_replay"
    review_interval: 20
  }

  goals: [
    "Check system metrics and identify anomalies",
    "Generate daily health reports",
    "Escalate critical issues immediately"
  ]

  triggers: {
    on_schedule: "every 5m"
  }

  initiative: true

  budget: {
    max_daily_calls: 100
    max_daily_cost: 5.0
    max_daily_tokens: 50000
  }

  memory: "monitor_memory"
}

{
  emit "Monitor agent registered. Running every 5 minutes.";
  // The agent starts executing autonomously in the background
}

What happens at runtime:

  1. On OP_DEFINE_AGENT, the VM registers the agent with the AutonomousExecutor.
  2. A background thread checks the schedule every second.
  3. When the schedule fires (every 5 minutes), the VM:
  4. Checks daily budget limits.
  5. Constructs a query from the agent's goals: "You have these goals: [goals]. Take autonomous action."
  6. Calls the agent internally with the constructed query.
  7. Logs the action and token usage to autonomous_actions and autonomous_budgets tables.
  8. Budget counters reset daily at midnight.

Schedule Expressions #

Expression Interval
"every 30s" Every 30 seconds
"every 5m" Every 5 minutes
"every 1h" Every 1 hour
"every 1d" Every 1 day

Managing Goals at Runtime #

neam
{
  // Read current goals
  let goals = agent_get_goals("Monitor");
  emit "Current goals: " + str(goals);

  // Update goals dynamically
  agent_set_goals("Monitor", [
    "Check system metrics and identify anomalies",
    "Generate hourly health reports",
    "Escalate critical issues to ops team",
    "Track API response times"
  ]);
  emit "Goals updated.";
}

Pause and Resume #

neam
{
  // Pause during a maintenance window
  agent_pause("Monitor");
  emit "Monitor paused.";

  // ... perform maintenance ...

  // Resume autonomous execution
  agent_resume("Monitor");
  emit "Monitor resumed.";
}

Budget Limits #

Field Type Default Description
budget.max_daily_calls int 100 Maximum LLM calls per day
budget.max_daily_cost float 5.0 Maximum cost in USD per day
budget.max_daily_tokens int 50000 Maximum tokens consumed per day

When any limit is reached, autonomous execution pauses until the next daily reset. Interactive calls (.ask()) are not affected by autonomous budgets.


17.6 Embedded Inference #

Embedded inference enables agents to run models locally within the Neam process using ONNX runtime, without requiring an external service like Ollama.

neam
agent LocalClassifier {
  provider: "local"
  model_path: "./models/classifier.onnx"
  system: "Classify input text into categories."
}

This feature is gated behind a compile-time flag and requires building Neam with ONNX support. It is primarily useful for lightweight classification, embedding, or preprocessing tasks where the overhead of an external API call is undesirable.

📝 Note

Embedded inference is an advanced feature intended for edge deployment scenarios. For most use cases, Ollama provides a simpler path to local model execution.


17.7 Cognitive Native Functions Reference #

Neam provides eleven native functions for interacting with cognitive features programmatically:

Function Arity Returns Description
agent_rate(agent, score) 2 nil Submit feedback score (0.0-1.0) for the last response
agent_reflect(agent) 1 map Trigger on-demand reflection; returns {"dimension": score}
agent_evolve(agent) 1 nil Trigger manual prompt evolution cycle
agent_rollback(agent, version) 2 nil Rollback prompt to a specific version number
agent_status(agent) 1 map Full cognitive state: reasoning_mode, learning_count, evolution_version, goals, evolved_prompt
agent_learning_stats(agent) 1 map Learning statistics: total_interactions, avg_reflection_score, reviews_completed
agent_prompt_history(agent) 1 list List of all evolved prompt strings (index = version)
agent_get_goals(agent) 1 list Get current goals list
agent_set_goals(agent, goals) 2 nil Update goals at runtime
agent_pause(agent) 1 nil Pause autonomous execution
agent_resume(agent) 1 nil Resume autonomous execution

Usage Patterns #

neam
{
  // === Feedback Loop ===
  let answer = MyAgent.ask("question");
  agent_rate("MyAgent", 0.8);

  // === Inspect State ===
  let status = agent_status("MyAgent");
  let stats = agent_learning_stats("MyAgent");
  let history = agent_prompt_history("MyAgent");
  let reflection = agent_reflect("MyAgent");

  // === Control Evolution ===
  agent_evolve("MyAgent");
  agent_rollback("MyAgent", 0);

  // === Control Autonomy ===
  agent_set_goals("MyAgent", ["new goal"]);
  agent_pause("MyAgent");
  agent_resume("MyAgent");
}

17.8 SQLite Persistence #

All cognitive data is persisted to SQLite when a memory store is configured on the agent. The following tables are created automatically:

Table Key Columns Purpose
learning_interactions agent_name, query, response, reflection_score, feedback_score, tokens_used, timestamp Every query/response pair with scores
learning_reviews agent_name, strategy, interactions_reviewed, avg_reflection_score, lessons_json, prompt_addendum, timestamp Periodic review results
prompt_evolution agent_name, version, original_prompt, evolved_prompt, reasoning, status, timestamp Versioned prompt history
autonomous_actions agent_name, trigger_type, action_taken, tokens_used, timestamp Log of autonomous agent actions
autonomous_budgets agent_name, date, calls_used, tokens_used, cost_used Daily budget tracking

Data Lifecycle #

text
Interaction --> learning_interactions (every call)
                      |
                      v (every review_interval calls)
                learning_reviews
                      |
                      v (every review_after calls)
                prompt_evolution
                      |
                      v
                Agent uses evolved prompt for all future calls

Data survives across program restarts, enabling agents to resume learning from where they left off.

Querying Data Externally #

The SQLite database is a standard .db file. You can query it directly with the sqlite3 command-line tool:

bash
# View all evolved prompts
sqlite3 ~/.neam/memory.db \
  "SELECT agent_name, version, evolved_prompt FROM prompt_evolution ORDER BY version;"

# View learning statistics
sqlite3 ~/.neam/memory.db \
  "SELECT agent_name, COUNT(*) as interactions, AVG(reflection_score) as avg_score
   FROM learning_interactions GROUP BY agent_name;"

# View budget usage
sqlite3 ~/.neam/memory.db \
  "SELECT * FROM autonomous_budgets WHERE date = date('now');"

17.9 Memory Systems #

The cognitive architecture so far stores learning data in flat SQLite tables. For agents that need richer memory capabilities, the standard library (std.agents.advanced.memory) provides four specialized memory systems that mirror human cognitive memory types.

Semantic Memory #

Semantic memory stores factual knowledge as vector-embedded entries organized by category. It supports similarity-based retrieval, making it ideal for building agents that accumulate domain expertise over time.

neam
import std::agents::advanced::memory::semantic;

let mem = semantic::create_memory({ "embedding_model": "nomic-embed-text" });
mem = semantic::store_fact(mem, "Neam compiles to bytecode", "language");
mem = semantic::store_fact(mem, "Agents connect to LLM providers", "architecture");

let related = semantic::retrieve_similar(mem, "How does Neam execute code?", 3);
let by_topic = semantic::retrieve_by_category(mem, "language");

Episodic Memory #

Episodic memory records events and sessions -- timestamped sequences of what happened during an agent's operation. It supports timeline queries and content-based search using embeddings.

neam
import std::agents::advanced::memory::episodic;

let ep = episodic::create_memory();
let session_id = episodic::start_episode(ep, "support_ticket_42");

ep = episodic::add_event(ep, session_id, "user_greeting", {
  "message": "Hello, I need help with billing."
});
ep = episodic::add_event(ep, session_id, "agent_response", {
  "message": "I can help with that. What is your account number?"
});
ep = episodic::end_episode(ep, session_id);

// Search past episodes by content similarity
let similar = episodic::search_episodes(ep, "billing question", 5);

Episodic memory also supports consolidation -- a process that prunes old, low- importance episodes to keep memory usage bounded:

neam
// Consolidate episodes older than 30 days with importance below 0.3
ep = episodic::consolidate(ep, { "max_age_days": 30, "min_importance": 0.3 });

Working Memory #

Working memory manages the current conversation context with a fixed capacity. It tracks what the agent is currently "focused on" and automatically evicts the least relevant items when capacity is reached.

neam
import std::agents::advanced::memory::working;

let wm = working::create_memory({ "capacity": 10 });
wm = working::add_item(wm, "user_name", "Alice");
wm = working::add_item(wm, "current_topic", "billing dispute");
wm = working::set_focus(wm, "billing dispute");

// Get items most relevant to the current query
let context = working::get_relevant(wm, "What is Alice's refund status?", 5);

// Export working memory as a text block for prompt injection
let context_text = working::to_text(wm);

Knowledge Graph Memory #

The knowledge graph module stores entities and relationships, enabling agents to reason about connections between concepts:

neam
import std::agents::advanced::memory::graph;

let kg = graph::create_graph();
kg = graph::add_node(kg, "Alice", "Customer", { "plan": "premium" });
kg = graph::add_node(kg, "Order-123", "Order", { "amount": 49.99 });
kg = graph::add_edge(kg, "Alice", "Order-123", "placed");

// Query: find all orders placed by Alice
let orders = graph::query(kg, { "from": "Alice", "relation": "placed" });

Combining Memory Systems #

A production agent might use all four memory types together:

Memory Type Stores Retrieval Persistence
Semantic Facts, domain knowledge Similarity search by embedding Long-term (SQLite + vectors)
Episodic Events, conversation sessions Timeline or content search Long-term (SQLite)
Working Current context, active focus Relevance to current query Session-scoped (in-memory)
Knowledge graph Entities, relationships Graph traversal, pattern matching Long-term (SQLite)
📝 Note

The memory field on an agent declaration (e.g., memory: "tutor_memory") configures the SQLite store used by the learning loop and prompt evolution. The std.agents.advanced.memory modules provide additional memory capabilities that you manage explicitly in your program logic.


17.10 Full Cognitive Agent Example #

This example combines all cognitive features into a single production-ready agent:

neam
agent ResearchAssistant {
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.5
  system: "You are an AI research assistant. You analyze papers, summarize findings,
           and track research trends across multiple domains."

  // Step 1: Structured thinking -- plans before answering
  reasoning: plan_and_execute

  // Step 2: Self-evaluation after every response
  reflect: {
    after: each_response
    evaluate: [accuracy, completeness, clarity, relevance]
    min_confidence: 0.75
    on_low_quality: {
      strategy: "revise"
      max_revisions: 2
    }
  }

  // Step 3: Learn from interaction history
  learning: {
    strategy: "experience_replay"
    review_interval: 15
    max_adaptations: 100
    rollback_on_decline: true
  }

  // Step 4: Evolve prompt over time
  evolve: {
    mutable: [system_prompt]
    review_after: 100
    core_identity: "You are an AI research assistant."
    allow_rollback: true
  }

  // Step 5: Autonomous daily research
  goals: [
    "Track new papers in AI safety and alignment",
    "Produce weekly research digests"
  ]
  triggers: {
    on_schedule: "every 1d"
  }
  initiative: true
  budget: {
    max_daily_calls: 50
    max_daily_cost: 10.0
  }

  memory: "research_memory"
}

{
  // Interactive usage
  let summary = ResearchAssistant.ask(
    "Summarize the latest developments in RLHF and DPO for LLM alignment."
  );
  emit summary;
  // The VM: plans -> executes steps -> reflects -> records interaction -> returns

  // Provide feedback
  agent_rate("ResearchAssistant", 0.95);

  // Check cognitive status
  let status = agent_status("ResearchAssistant");
  emit "Reasoning mode:      " + str(status["reasoning_mode"]);
  emit "Learning count:      " + str(status["learning_count"]);
  emit "Evolution version:   " + str(status["evolution_version"]);
  emit "Goals:               " + str(status["goals"]);

  // Check learning statistics
  let stats = agent_learning_stats("ResearchAssistant");
  emit "Avg reflection score: " + str(stats["avg_reflection_score"]);

  // View prompt evolution history
  let history = agent_prompt_history("ResearchAssistant");
  emit "Prompt versions: " + str(len(history));
}

Expected Output (After Several Interactions) #

text
[Plan and Execute: generating plan...]
[Step 1: Survey recent RLHF papers...]
[Step 2: Survey DPO developments...]
[Step 3: Compare approaches...]
[Synthesizing final answer...]

RLHF (Reinforcement Learning from Human Feedback) continues to evolve with...
[detailed multi-paragraph response]

Reasoning mode:      plan_and_execute
Learning count:      47
Evolution version:   2
Goals:               ["Track new papers in AI safety and alignment", "Produce weekly research digests"]
Avg reflection score: 0.82
Prompt versions: 2

17.11 Migration from v0.4.x #

Zero Breaking Changes #

v0.5.0 is fully backward compatible with v0.4.x:

Incremental Adoption Path #

Start with reasoning, the simplest and cheapest cognitive feature:

neam
// v0.4.x agent -- still works unchanged
agent Helper {
  provider: "openai"
  model: "gpt-4o"
  system: "You are helpful."
}

// v0.5.0 -- add reasoning (one line change)
agent SmartHelper {
  provider: "openai"
  model: "gpt-4o"
  system: "You are helpful."
  reasoning: chain_of_thought
}

Then add reflection:

neam
agent ReflectiveHelper {
  provider: "openai"
  model: "gpt-4o"
  system: "You are helpful."
  reasoning: chain_of_thought
  reflect: {
    after: each_response
    evaluate: [accuracy, helpfulness]
    min_confidence: 0.7
    on_low_quality: { strategy: "revise" }
  }
}

Then learning and evolution:

neam
agent EvolvingHelper {
  provider: "openai"
  model: "gpt-4o"
  system: "You are helpful."
  reasoning: chain_of_thought
  reflect: {
    after: each_response
    evaluate: [accuracy, helpfulness]
    min_confidence: 0.7
    on_low_quality: { strategy: "revise" }
  }
  learning: {
    strategy: "experience_replay"
    review_interval: 10
  }
  evolve: {
    mutable: [system_prompt]
    review_after: 50
    core_identity: "You are helpful."
    allow_rollback: true
  }
  memory: "helper_memory"
}

Summary #

In this chapter you learned:


Exercises #

Exercise 17.1: Reasoning Strategy Comparison #

Create four agents, each using a different reasoning strategy (chain_of_thought, plan_and_execute, tree_of_thought, self_consistency). Ask each agent the same complex question (e.g., "Should a startup prioritize growth or profitability in its first two years?"). Compare the responses in terms of quality, depth, and cost (number of LLM calls). Which strategy produced the best answer? Which was the most cost-effective?

Exercise 17.2: Quality Gate with Reflection #

Build an agent with reflection enabled. Set min_confidence to 0.9 (deliberately high). Ask the agent a moderately difficult question and observe:

  1. What reflection scores does it produce?
  2. How many revision rounds does it go through?
  3. Is the final answer better than the first attempt?

Repeat with min_confidence set to 0.5 and compare the behavior.

Exercise 17.3: Learning Over Time #

Create a tutoring agent with learning enabled (review_interval: 5). Ask it 10 questions about a subject you know well. After each answer, use agent_rate() to provide a feedback score. After the 10th question, check agent_learning_stats(). Has the average reflection score improved over the session? What lessons did the learning review extract?

Exercise 17.4: Prompt Evolution Experiment #

Create an agent with all cognitive features enabled and review_after: 10 (low threshold for testing). Run 15 interactions with consistent feedback. Then:

  1. Check the prompt history with agent_prompt_history().
  2. Compare the original and evolved prompts.
  3. Rollback to version 0 using agent_rollback().
  4. Verify the agent uses the original prompt.

Document how the evolved prompt differs from the original.

Exercise 17.5: Autonomous Monitor #

Build an autonomous agent with: - Goals related to a task you define (e.g., "Check weather and report temperature"). - A schedule of "every 30s" (for quick testing). - A budget of max_daily_calls: 5.

Run the program and observe: 1. How often does the agent execute? 2. What happens when the budget limit is reached? 3. Use agent_pause() and agent_resume() to control execution.

Exercise 17.6: Full Cognitive Pipeline #

Design a customer support agent that combines all six cognitive capabilities. The agent should:

  1. Use plan_and_execute reasoning for complex tickets.
  2. Reflect on every response with accuracy and tone evaluation.
  3. Learn from interaction history with experience_replay.
  4. Evolve its prompt, preserving the core identity: "You are a customer support agent."
  5. Run autonomously every hour to generate a summary of recent tickets.
  6. Operate within a daily budget of 200 calls and $20.

Write the complete Neam program, including at least 5 test interactions and cognitive status checks after each one.

Start typing to search...