Chapter 17: Cognitive Features #
"An agent that cannot reflect on its own output is merely an API wrapper."
Up to this point in the book, every agent we have built has been stateless and reactive: it receives a prompt, calls an LLM, and returns the response. It does not evaluate the quality of its own output, learn from past interactions, improve its prompts over time, or take action on its own initiative. In production, this is a serious limitation.
Neam v0.5.0 introduces a cognitive architecture -- a set of six opt-in capabilities that transform reactive agents into reflective, learning, evolving, autonomous systems. These capabilities build on each other in a natural progression:
- Reasoning -- structured thinking strategies before answering.
- Reflection -- self-evaluation of output quality.
- Learning -- recording and reviewing interaction history.
- Prompt Evolution -- automatic refinement of system prompts.
- Autonomy -- goal-driven behavior on schedules with budgets.
- Embedded Inference -- local model execution (ONNX).
Each feature is fully opt-in. An agent without cognitive properties behaves identically to v0.4.x. You can adopt these capabilities incrementally, starting with reasoning and adding more as your requirements grow.
Dependency Chain #
The cognitive features form a natural progression:
Reasoning (standalone -- no prerequisites)
|
v
Reflection (uses reasoning output as input for evaluation)
|
v
Learning (records reflection scores to SQLite)
|
v
Evolution (uses learning reviews to propose prompt changes)
Autonomy (standalone -- benefits from all above, but works independently)
Each layer can operate independently, but they compound when combined. A learning agent without reflection still records interactions, but the learning reviews are less informative without quality scores to analyze.
17.1 Reasoning Strategies #
Reasoning strategies add structured thinking to an agent's response process. Instead of generating a single-shot answer, the agent is instructed to think before answering -- breaking problems into steps, exploring multiple paths, or generating multiple independent answers.
Chain of Thought #
The simplest reasoning mode. The VM prepends chain-of-thought instructions to the system prompt, producing step-by-step reasoning before the final answer.
agent Analyst {
provider: "openai"
model: "gpt-4o"
temperature: 0.3
system: "You are a data analyst. Break down complex questions step by step."
reasoning: chain_of_thought
}
{
let answer = Analyst.ask("Why did revenue drop 15% in Q3?");
emit answer;
}
What happens at runtime:
- The VM prepends
"Think step-by-step before answering..."to the system prompt. - The agent produces numbered reasoning steps followed by a conclusion.
- The full response (with reasoning) is returned.
Cost: 1 LLM call (same as no reasoning, but the response is longer).
Plan and Execute #
Generates a multi-step plan, executes each step as a separate LLM call, then synthesizes results. This is the most thorough reasoning mode for complex, multi-faceted questions.
agent ProjectPlanner {
provider: "openai"
model: "gpt-4o"
temperature: 0.4
system: "You are a project planning expert."
reasoning: plan_and_execute
}
{
let plan = ProjectPlanner.ask(
"Create a launch plan for a mobile app targeting college students."
);
emit plan;
}
What happens at runtime:
- LLM call #1: Generate a numbered plan (e.g., 5 steps).
- LLM calls #2 through #6: Execute each step individually.
- LLM call #7: Synthesize all step results into a final answer.
Cost: N + 2 LLM calls, where N is the number of plan steps.
Tree of Thought #
Explores multiple reasoning branches, scores each, and selects the best path. Ideal for decisions where multiple viable options exist.
agent Strategist {
provider: "openai"
model: "gpt-4o"
system: "You are a strategic advisor. Evaluate all options carefully."
reasoning: tree_of_thought
}
{
let strategy = Strategist.ask(
"Should we expand into the EU market or focus on APAC first?"
);
emit strategy;
// The VM generates 3 branches, scores each, and returns the best one
}
What happens at runtime:
- LLM call #1: Generate 3 distinct approaches to the problem.
- LLM call #2: Score each approach on feasibility, impact, and risk.
- Return the highest-scoring approach with its reasoning.
Cost: 2-3 LLM calls.
Self Consistency #
Generates multiple independent answers and returns the majority consensus. This is particularly effective for math, logic, and factual questions where there is a single correct answer.
agent MathSolver {
provider: "openai"
model: "gpt-4o"
temperature: 0.7
system: "You solve math problems."
reasoning: self_consistency
}
{
let answer = MathSolver.ask("What is the integral of x^2 * e^x dx?");
emit answer;
// Generates 3 independent solutions and picks the consensus
}
What happens at runtime:
- Generate N independent answers (default N=3, with elevated temperature for diversity).
- LLM call #N+1: Analyze all answers and select the majority consensus.
- Return the consensus answer with confidence.
Cost: N + 1 LLM calls.
Reasoning Strategy Comparison #
| Mode | LLM Calls | Best For | Trade-off |
|---|---|---|---|
chain_of_thought |
1 | General reasoning, step-by-step analysis | Minimal overhead |
plan_and_execute |
N+2 | Complex multi-step tasks, project planning | High cost, thorough |
tree_of_thought |
2-3 | Decisions with multiple viable options | Moderate cost |
self_consistency |
N+1 | Math, logic, factual questions | High cost, high accuracy |
Reasoning Configuration #
For finer control, add a reasoning_config block alongside the reasoning strategy:
agent CarefulAnalyst {
provider: "openai"
model: "gpt-4o"
system: "You are a meticulous analyst."
reasoning: plan_and_execute
reasoning_config: {
max_steps: 5
show_thinking: false
verify_before_respond: true
}
}
| Field | Type | Default | Description |
|---|---|---|---|
max_steps |
int | 5 |
Maximum reasoning steps (plan_and_execute, tree_of_thought) |
show_thinking |
bool | false |
Include the reasoning trace in the returned output |
verify_before_respond |
bool | true |
Add a final verification step before returning |
When show_thinking is true, the response includes the full reasoning trace (numbered
steps, branch scores, or consensus analysis) before the final answer. This is useful for
debugging or for applications that want to display the agent's thought process to users.
When verify_before_respond is true, the agent performs a final check after reasoning
to confirm the answer is consistent with the reasoning steps. This adds one extra LLM
call but catches contradictions between the reasoning trace and the final answer.
17.2 Reflection #
Reflection enables an agent to evaluate the quality of its own output. After generating a response, the agent calls the LLM again to score the response across configurable quality dimensions. If the score falls below a threshold, the agent can automatically revise, retry, escalate, or acknowledge the low confidence.
Basic Reflection #
agent QAAgent {
provider: "openai"
model: "gpt-4o"
temperature: 0.4
system: "You answer technical questions accurately and concisely."
reasoning: chain_of_thought
reflect: {
after: each_response
evaluate: [accuracy, relevance, completeness]
min_confidence: 0.7
on_low_quality: {
strategy: "revise"
max_revisions: 2
}
}
}
{
let answer = QAAgent.ask("Explain the CAP theorem and its implications.");
emit answer;
// If self-evaluation scores below 0.7, the agent automatically revises
}
What happens at runtime:
- The agent generates a response (with reasoning if configured).
- The VM builds an evaluation prompt: "Rate this response on accuracy, relevance, completeness (0.0-1.0)."
- The LLM returns scores as JSON:
{"accuracy": 0.85, "relevance": 0.9, "completeness": 0.6} - The VM computes the average score (0.78 in this example).
- If the average is below
min_confidence(0.7), the configured strategy triggers.
Low-Quality Strategies #
| Strategy | Behavior |
|---|---|
"revise" |
Feed the scores back to the agent and ask it to improve (up to max_revisions times) |
"retry" |
Regenerate the response from scratch |
"escalate" |
Hand off to a more capable agent specified by escalate_to |
"acknowledge" |
Return the response with a confidence warning appended |
Escalation Example #
Route low-quality responses to a more capable agent:
agent JuniorAgent {
provider: "openai"
model: "gpt-4o-mini"
system: "You answer general questions."
reflect: {
after: each_response
evaluate: [accuracy, completeness]
min_confidence: 0.8
on_low_quality: {
strategy: "escalate"
escalate_to: "SeniorAgent"
}
}
}
agent SeniorAgent {
provider: "openai"
model: "gpt-4o"
system: "You are a senior expert. Provide thorough, detailed answers."
}
{
let answer = JuniorAgent.ask("Explain quantum entanglement.");
emit answer;
// If JuniorAgent scores below 0.8, SeniorAgent handles it automatically
}
On-Demand Reflection #
Trigger reflection manually from code using agent_reflect():
{
let answer = QAAgent.ask("What is RAFT consensus?");
emit answer;
// Manually trigger reflection
let scores = agent_reflect("QAAgent");
emit "Accuracy: " + str(scores["accuracy"]);
emit "Relevance: " + str(scores["relevance"]);
emit "Completeness: " + str(scores["completeness"]);
}
The function returns a map of dimension names to scores (0.0-1.0).
Explicit Feedback #
Provide external feedback scores (e.g., from a human reviewer or automated test) to influence the learning system:
{
let answer = QAAgent.ask("What is RAFT consensus?");
emit answer;
// Rate the response (0.0 to 1.0)
agent_rate("QAAgent", 0.9);
}
Reflection Configuration Reference #
| Field | Type | Default | Description |
|---|---|---|---|
after |
identifier | required | When to reflect: each_response, every_n, or on_demand |
evaluate |
list | required | Dimensions to score (free-form identifiers) |
min_confidence |
float | 0.7 |
Minimum average score threshold (0.0-1.0) |
on_low_quality.strategy |
string | "revise" |
"revise", "retry", "escalate", "acknowledge" |
on_low_quality.max_revisions |
int | 2 |
Maximum revision attempts |
on_low_quality.escalate_to |
string | -- | Agent name for escalation |
17.3 Learning Loop #
The learning loop enables agents to learn from their interaction history. The VM records every query/response pair to SQLite, periodically reviews accumulated interactions, and extracts patterns to improve future responses.
Enabling Learning #
agent Tutor {
provider: "openai"
model: "gpt-4o"
temperature: 0.5
system: "You are a patient programming tutor."
reasoning: chain_of_thought
reflect: {
after: each_response
evaluate: [clarity, accuracy]
min_confidence: 0.7
on_low_quality: { strategy: "revise", max_revisions: 1 }
}
learning: {
strategy: "experience_replay"
review_interval: 10
max_adaptations: 50
rollback_on_decline: true
}
memory: "tutor_memory"
}
What happens at runtime:
For every interaction (query + response):
- The VM records to SQLite: query, response, reflection scores, feedback score, token count, and timestamp.
- The interaction counter increments.
Every N interactions (where N = review_interval):
- The VM loads the last N interactions from SQLite.
- Builds a meta-prompt: "Review these interactions. Extract patterns, identify weaknesses, suggest improvements."
- The LLM returns a learning review with lessons and a prompt addendum.
- The review is stored in the
learning_reviewstable. - The prompt addendum is appended to the agent's system prompt for future calls.
Checking Learning Progress #
{
// Run several interactions
let a1 = Tutor.ask("Explain recursion.");
let a2 = Tutor.ask("What is a binary tree?");
let a3 = Tutor.ask("How does quicksort work?");
// Check learning statistics
let stats = agent_learning_stats("Tutor");
emit "Total interactions: " + str(stats["total_interactions"]);
emit "Average reflection score: " + str(stats["avg_reflection_score"]);
emit "Reviews completed: " + str(stats["reviews_completed"]);
}
Learning Strategies #
| Strategy | Description | Best For |
|---|---|---|
experience_replay |
Review recent interactions, extract lessons from successes and failures | General-purpose agents |
pattern_extraction |
Identify recurring patterns in queries and optimize response templates | FAQ/support agents |
prompt_evolution |
Gradually evolve the system prompt based on learning reviews | Long-running agents |
preference_learning |
Learn from explicit feedback scores to adjust behavior | User-facing agents |
Learning Configuration Reference #
| Field | Type | Default | Description |
|---|---|---|---|
strategy |
string | required | Learning strategy (see table above) |
review_interval |
int | 10 |
Trigger review every N interactions |
max_adaptations |
int | 50 |
Maximum prompt adjustments over lifetime |
rollback_on_decline |
bool | true |
Revert if performance declines after adaptation |
Important Notes #
- Learning recording is asynchronous (non-blocking) -- it adds less than 1ms overhead to each interaction.
- A
memorystore must be configured for learning data to persist across restarts. - Without
memory, learning data is held in-memory only and lost on exit. - The learning feature works without reflection, but reflection scores significantly improve learning quality.
17.4 Prompt Evolution #
Prompt evolution enables agents to rewrite their own system prompt over time based on accumulated learning data. The evolved prompt is validated against a declared core identity (ensuring the agent does not drift from its intended purpose) and supports version rollback.
Enabling Evolution #
agent SalesBot {
provider: "openai"
model: "gpt-4o"
temperature: 0.6
system: "You are a friendly sales assistant. Help customers find the right product."
reasoning: chain_of_thought
reflect: {
after: each_response
evaluate: [helpfulness, accuracy, tone]
min_confidence: 0.7
on_low_quality: { strategy: "revise", max_revisions: 2 }
}
learning: {
strategy: "prompt_evolution"
review_interval: 10
max_adaptations: 30
rollback_on_decline: true
}
evolve: {
mutable: [system_prompt, temperature]
review_after: 50
core_identity: "You are a friendly sales assistant."
allow_rollback: true
}
memory: "sales_memory"
}
What happens at runtime:
After review_after interactions (50 in this example):
- The VM loads all learning reviews from SQLite.
- Builds a meta-prompt: "Based on these reviews, propose an improved system prompt. The following text MUST appear verbatim: 'You are a friendly sales assistant.'"
- The LLM proposes a new prompt with reasoning.
- The VM validates that the
core_identitytext is preserved. - The new prompt is stored in
prompt_evolutionwith an incremented version number. - All subsequent calls use the evolved prompt.
Checking Evolution Status #
{
let status = agent_status("SalesBot");
emit "Evolution version: " + str(status["evolution_version"]);
emit "Current prompt: " + str(status["evolved_prompt"]);
emit "Reasoning mode: " + str(status["reasoning_mode"]);
emit "Learning count: " + str(status["learning_count"]);
}
Manual Evolution and Rollback #
{
// Force an evolution cycle now (regardless of review_after threshold)
agent_evolve("SalesBot");
// View all prompt versions
let history = agent_prompt_history("SalesBot");
emit "Total versions: " + str(len(history));
// Inspect each version
for (version, prompt) in enumerate(history) {
emit "Version " + str(version) + ": " + str(prompt);
}
// Rollback to version 1 if the latest evolution is not performing well
agent_rollback("SalesBot", 1);
emit "Rolled back to version 1.";
}
Evolution Configuration Reference #
| Field | Type | Default | Description |
|---|---|---|---|
mutable |
list | required | Fields that can evolve: system_prompt, temperature |
review_after |
int | 50 |
Trigger evolution after N total interactions |
core_identity |
string | -- | Text that must always appear verbatim in the evolved prompt |
allow_rollback |
bool | true |
Enable rollback to previous versions |
Important Notes #
- Evolution requires
learningto be enabled (it uses learning review data as input). - The
core_identitystring is enforced at evolution time -- the proposed prompt is rejected if it does not contain the string verbatim. - Evolved prompts are persisted in SQLite and survive restarts.
- If
rollback_on_declineis true in the learning config, the VM automatically reverts if average scores drop after evolution.
17.5 Autonomy and Goals #
Autonomy transforms an agent from a passive tool (responds only when called) into an active participant that pursues goals on a schedule, within defined resource budgets.
Basic Autonomous Agent #
agent Monitor {
provider: "openai"
model: "gpt-4o-mini"
temperature: 0.3
system: "You monitor system health. Report anomalies and suggest fixes."
reasoning: chain_of_thought
reflect: {
after: each_response
evaluate: [accuracy, relevance]
min_confidence: 0.6
on_low_quality: { strategy: "acknowledge" }
}
learning: {
strategy: "experience_replay"
review_interval: 20
}
goals: [
"Check system metrics and identify anomalies",
"Generate daily health reports",
"Escalate critical issues immediately"
]
triggers: {
on_schedule: "every 5m"
}
initiative: true
budget: {
max_daily_calls: 100
max_daily_cost: 5.0
max_daily_tokens: 50000
}
memory: "monitor_memory"
}
{
emit "Monitor agent registered. Running every 5 minutes.";
// The agent starts executing autonomously in the background
}
What happens at runtime:
- On
OP_DEFINE_AGENT, the VM registers the agent with theAutonomousExecutor. - A background thread checks the schedule every second.
- When the schedule fires (every 5 minutes), the VM:
- Checks daily budget limits.
- Constructs a query from the agent's goals: "You have these goals: [goals]. Take autonomous action."
- Calls the agent internally with the constructed query.
- Logs the action and token usage to
autonomous_actionsandautonomous_budgetstables. - Budget counters reset daily at midnight.
Schedule Expressions #
| Expression | Interval |
|---|---|
"every 30s" |
Every 30 seconds |
"every 5m" |
Every 5 minutes |
"every 1h" |
Every 1 hour |
"every 1d" |
Every 1 day |
Managing Goals at Runtime #
{
// Read current goals
let goals = agent_get_goals("Monitor");
emit "Current goals: " + str(goals);
// Update goals dynamically
agent_set_goals("Monitor", [
"Check system metrics and identify anomalies",
"Generate hourly health reports",
"Escalate critical issues to ops team",
"Track API response times"
]);
emit "Goals updated.";
}
Pause and Resume #
{
// Pause during a maintenance window
agent_pause("Monitor");
emit "Monitor paused.";
// ... perform maintenance ...
// Resume autonomous execution
agent_resume("Monitor");
emit "Monitor resumed.";
}
Budget Limits #
| Field | Type | Default | Description |
|---|---|---|---|
budget.max_daily_calls |
int | 100 |
Maximum LLM calls per day |
budget.max_daily_cost |
float | 5.0 |
Maximum cost in USD per day |
budget.max_daily_tokens |
int | 50000 |
Maximum tokens consumed per day |
When any limit is reached, autonomous execution pauses until the next daily reset.
Interactive calls (.ask()) are not affected by autonomous budgets.
17.6 Embedded Inference #
Embedded inference enables agents to run models locally within the Neam process using ONNX runtime, without requiring an external service like Ollama.
agent LocalClassifier {
provider: "local"
model_path: "./models/classifier.onnx"
system: "Classify input text into categories."
}
This feature is gated behind a compile-time flag and requires building Neam with ONNX support. It is primarily useful for lightweight classification, embedding, or preprocessing tasks where the overhead of an external API call is undesirable.
Embedded inference is an advanced feature intended for edge deployment scenarios. For most use cases, Ollama provides a simpler path to local model execution.
17.7 Cognitive Native Functions Reference #
Neam provides eleven native functions for interacting with cognitive features programmatically:
| Function | Arity | Returns | Description |
|---|---|---|---|
agent_rate(agent, score) |
2 | nil | Submit feedback score (0.0-1.0) for the last response |
agent_reflect(agent) |
1 | map | Trigger on-demand reflection; returns {"dimension": score} |
agent_evolve(agent) |
1 | nil | Trigger manual prompt evolution cycle |
agent_rollback(agent, version) |
2 | nil | Rollback prompt to a specific version number |
agent_status(agent) |
1 | map | Full cognitive state: reasoning_mode, learning_count, evolution_version, goals, evolved_prompt |
agent_learning_stats(agent) |
1 | map | Learning statistics: total_interactions, avg_reflection_score, reviews_completed |
agent_prompt_history(agent) |
1 | list | List of all evolved prompt strings (index = version) |
agent_get_goals(agent) |
1 | list | Get current goals list |
agent_set_goals(agent, goals) |
2 | nil | Update goals at runtime |
agent_pause(agent) |
1 | nil | Pause autonomous execution |
agent_resume(agent) |
1 | nil | Resume autonomous execution |
Usage Patterns #
{
// === Feedback Loop ===
let answer = MyAgent.ask("question");
agent_rate("MyAgent", 0.8);
// === Inspect State ===
let status = agent_status("MyAgent");
let stats = agent_learning_stats("MyAgent");
let history = agent_prompt_history("MyAgent");
let reflection = agent_reflect("MyAgent");
// === Control Evolution ===
agent_evolve("MyAgent");
agent_rollback("MyAgent", 0);
// === Control Autonomy ===
agent_set_goals("MyAgent", ["new goal"]);
agent_pause("MyAgent");
agent_resume("MyAgent");
}
17.8 SQLite Persistence #
All cognitive data is persisted to SQLite when a memory store is configured on the
agent. The following tables are created automatically:
| Table | Key Columns | Purpose |
|---|---|---|
learning_interactions |
agent_name, query, response, reflection_score, feedback_score, tokens_used, timestamp | Every query/response pair with scores |
learning_reviews |
agent_name, strategy, interactions_reviewed, avg_reflection_score, lessons_json, prompt_addendum, timestamp | Periodic review results |
prompt_evolution |
agent_name, version, original_prompt, evolved_prompt, reasoning, status, timestamp | Versioned prompt history |
autonomous_actions |
agent_name, trigger_type, action_taken, tokens_used, timestamp | Log of autonomous agent actions |
autonomous_budgets |
agent_name, date, calls_used, tokens_used, cost_used | Daily budget tracking |
Data Lifecycle #
Interaction --> learning_interactions (every call)
|
v (every review_interval calls)
learning_reviews
|
v (every review_after calls)
prompt_evolution
|
v
Agent uses evolved prompt for all future calls
Data survives across program restarts, enabling agents to resume learning from where they left off.
Querying Data Externally #
The SQLite database is a standard .db file. You can query it directly with the
sqlite3 command-line tool:
# View all evolved prompts
sqlite3 ~/.neam/memory.db \
"SELECT agent_name, version, evolved_prompt FROM prompt_evolution ORDER BY version;"
# View learning statistics
sqlite3 ~/.neam/memory.db \
"SELECT agent_name, COUNT(*) as interactions, AVG(reflection_score) as avg_score
FROM learning_interactions GROUP BY agent_name;"
# View budget usage
sqlite3 ~/.neam/memory.db \
"SELECT * FROM autonomous_budgets WHERE date = date('now');"
17.9 Memory Systems #
The cognitive architecture so far stores learning data in flat SQLite tables. For agents
that need richer memory capabilities, the standard library (std.agents.advanced.memory)
provides four specialized memory systems that mirror human cognitive memory types.
Semantic Memory #
Semantic memory stores factual knowledge as vector-embedded entries organized by category. It supports similarity-based retrieval, making it ideal for building agents that accumulate domain expertise over time.
import std::agents::advanced::memory::semantic;
let mem = semantic::create_memory({ "embedding_model": "nomic-embed-text" });
mem = semantic::store_fact(mem, "Neam compiles to bytecode", "language");
mem = semantic::store_fact(mem, "Agents connect to LLM providers", "architecture");
let related = semantic::retrieve_similar(mem, "How does Neam execute code?", 3);
let by_topic = semantic::retrieve_by_category(mem, "language");
Episodic Memory #
Episodic memory records events and sessions -- timestamped sequences of what happened during an agent's operation. It supports timeline queries and content-based search using embeddings.
import std::agents::advanced::memory::episodic;
let ep = episodic::create_memory();
let session_id = episodic::start_episode(ep, "support_ticket_42");
ep = episodic::add_event(ep, session_id, "user_greeting", {
"message": "Hello, I need help with billing."
});
ep = episodic::add_event(ep, session_id, "agent_response", {
"message": "I can help with that. What is your account number?"
});
ep = episodic::end_episode(ep, session_id);
// Search past episodes by content similarity
let similar = episodic::search_episodes(ep, "billing question", 5);
Episodic memory also supports consolidation -- a process that prunes old, low- importance episodes to keep memory usage bounded:
// Consolidate episodes older than 30 days with importance below 0.3
ep = episodic::consolidate(ep, { "max_age_days": 30, "min_importance": 0.3 });
Working Memory #
Working memory manages the current conversation context with a fixed capacity. It tracks what the agent is currently "focused on" and automatically evicts the least relevant items when capacity is reached.
import std::agents::advanced::memory::working;
let wm = working::create_memory({ "capacity": 10 });
wm = working::add_item(wm, "user_name", "Alice");
wm = working::add_item(wm, "current_topic", "billing dispute");
wm = working::set_focus(wm, "billing dispute");
// Get items most relevant to the current query
let context = working::get_relevant(wm, "What is Alice's refund status?", 5);
// Export working memory as a text block for prompt injection
let context_text = working::to_text(wm);
Knowledge Graph Memory #
The knowledge graph module stores entities and relationships, enabling agents to reason about connections between concepts:
import std::agents::advanced::memory::graph;
let kg = graph::create_graph();
kg = graph::add_node(kg, "Alice", "Customer", { "plan": "premium" });
kg = graph::add_node(kg, "Order-123", "Order", { "amount": 49.99 });
kg = graph::add_edge(kg, "Alice", "Order-123", "placed");
// Query: find all orders placed by Alice
let orders = graph::query(kg, { "from": "Alice", "relation": "placed" });
Combining Memory Systems #
A production agent might use all four memory types together:
| Memory Type | Stores | Retrieval | Persistence |
|---|---|---|---|
| Semantic | Facts, domain knowledge | Similarity search by embedding | Long-term (SQLite + vectors) |
| Episodic | Events, conversation sessions | Timeline or content search | Long-term (SQLite) |
| Working | Current context, active focus | Relevance to current query | Session-scoped (in-memory) |
| Knowledge graph | Entities, relationships | Graph traversal, pattern matching | Long-term (SQLite) |
The memory field on an agent declaration (e.g., memory: "tutor_memory")
configures the SQLite store used by the learning loop and prompt evolution. The
std.agents.advanced.memory modules provide additional memory capabilities that you
manage explicitly in your program logic.
17.10 Full Cognitive Agent Example #
This example combines all cognitive features into a single production-ready agent:
agent ResearchAssistant {
provider: "openai"
model: "gpt-4o"
temperature: 0.5
system: "You are an AI research assistant. You analyze papers, summarize findings,
and track research trends across multiple domains."
// Step 1: Structured thinking -- plans before answering
reasoning: plan_and_execute
// Step 2: Self-evaluation after every response
reflect: {
after: each_response
evaluate: [accuracy, completeness, clarity, relevance]
min_confidence: 0.75
on_low_quality: {
strategy: "revise"
max_revisions: 2
}
}
// Step 3: Learn from interaction history
learning: {
strategy: "experience_replay"
review_interval: 15
max_adaptations: 100
rollback_on_decline: true
}
// Step 4: Evolve prompt over time
evolve: {
mutable: [system_prompt]
review_after: 100
core_identity: "You are an AI research assistant."
allow_rollback: true
}
// Step 5: Autonomous daily research
goals: [
"Track new papers in AI safety and alignment",
"Produce weekly research digests"
]
triggers: {
on_schedule: "every 1d"
}
initiative: true
budget: {
max_daily_calls: 50
max_daily_cost: 10.0
}
memory: "research_memory"
}
{
// Interactive usage
let summary = ResearchAssistant.ask(
"Summarize the latest developments in RLHF and DPO for LLM alignment."
);
emit summary;
// The VM: plans -> executes steps -> reflects -> records interaction -> returns
// Provide feedback
agent_rate("ResearchAssistant", 0.95);
// Check cognitive status
let status = agent_status("ResearchAssistant");
emit "Reasoning mode: " + str(status["reasoning_mode"]);
emit "Learning count: " + str(status["learning_count"]);
emit "Evolution version: " + str(status["evolution_version"]);
emit "Goals: " + str(status["goals"]);
// Check learning statistics
let stats = agent_learning_stats("ResearchAssistant");
emit "Avg reflection score: " + str(stats["avg_reflection_score"]);
// View prompt evolution history
let history = agent_prompt_history("ResearchAssistant");
emit "Prompt versions: " + str(len(history));
}
Expected Output (After Several Interactions) #
[Plan and Execute: generating plan...]
[Step 1: Survey recent RLHF papers...]
[Step 2: Survey DPO developments...]
[Step 3: Compare approaches...]
[Synthesizing final answer...]
RLHF (Reinforcement Learning from Human Feedback) continues to evolve with...
[detailed multi-paragraph response]
Reasoning mode: plan_and_execute
Learning count: 47
Evolution version: 2
Goals: ["Track new papers in AI safety and alignment", "Produce weekly research digests"]
Avg reflection score: 0.82
Prompt versions: 2
17.11 Migration from v0.4.x #
Zero Breaking Changes #
v0.5.0 is fully backward compatible with v0.4.x:
- All new properties are optional.
- Agents without cognitive properties behave identically to v0.4.x.
- No changes to existing syntax, opcodes, or runtime behavior.
- Existing
.neambbytecode continues to work (new opcodes are additive).
Incremental Adoption Path #
Start with reasoning, the simplest and cheapest cognitive feature:
// v0.4.x agent -- still works unchanged
agent Helper {
provider: "openai"
model: "gpt-4o"
system: "You are helpful."
}
// v0.5.0 -- add reasoning (one line change)
agent SmartHelper {
provider: "openai"
model: "gpt-4o"
system: "You are helpful."
reasoning: chain_of_thought
}
Then add reflection:
agent ReflectiveHelper {
provider: "openai"
model: "gpt-4o"
system: "You are helpful."
reasoning: chain_of_thought
reflect: {
after: each_response
evaluate: [accuracy, helpfulness]
min_confidence: 0.7
on_low_quality: { strategy: "revise" }
}
}
Then learning and evolution:
agent EvolvingHelper {
provider: "openai"
model: "gpt-4o"
system: "You are helpful."
reasoning: chain_of_thought
reflect: {
after: each_response
evaluate: [accuracy, helpfulness]
min_confidence: 0.7
on_low_quality: { strategy: "revise" }
}
learning: {
strategy: "experience_replay"
review_interval: 10
}
evolve: {
mutable: [system_prompt]
review_after: 50
core_identity: "You are helpful."
allow_rollback: true
}
memory: "helper_memory"
}
Summary #
In this chapter you learned:
- Reasoning strategies: Four modes of structured thinking -- chain_of_thought, plan_and_execute, tree_of_thought, and self_consistency -- each with different cost/quality trade-offs.
- Reasoning configuration: Fine-tuning with
max_steps,show_thinking, andverify_before_respondfor controlling reasoning behavior. - Reflection: Self-evaluation with configurable quality dimensions, thresholds, and strategies for handling low-quality output (revise, retry, escalate, acknowledge).
- Learning loops: Recording interaction history to SQLite, periodic learning reviews, and four learning strategies (experience replay, pattern extraction, prompt evolution, preference learning).
- Prompt evolution: Automatic system prompt refinement with core identity preservation and version rollback.
- Autonomy: Goal-driven agents that act on schedules with daily budget limits for calls, cost, and tokens.
- Embedded inference: Local ONNX model execution for edge deployment.
- Native functions: Eleven functions for programmatic control of cognitive features.
- SQLite persistence: How cognitive data is stored and how to query it externally.
- Memory systems: Four specialized memory types from the standard library -- semantic (factual knowledge with vector retrieval), episodic (event and session tracking), working (current context with capacity limits), and knowledge graph (entities and relationships).
- Incremental adoption: How to migrate from v0.4.x by adding one feature at a time.
Exercises #
Exercise 17.1: Reasoning Strategy Comparison #
Create four agents, each using a different reasoning strategy (chain_of_thought, plan_and_execute, tree_of_thought, self_consistency). Ask each agent the same complex question (e.g., "Should a startup prioritize growth or profitability in its first two years?"). Compare the responses in terms of quality, depth, and cost (number of LLM calls). Which strategy produced the best answer? Which was the most cost-effective?
Exercise 17.2: Quality Gate with Reflection #
Build an agent with reflection enabled. Set min_confidence to 0.9 (deliberately
high). Ask the agent a moderately difficult question and observe:
- What reflection scores does it produce?
- How many revision rounds does it go through?
- Is the final answer better than the first attempt?
Repeat with min_confidence set to 0.5 and compare the behavior.
Exercise 17.3: Learning Over Time #
Create a tutoring agent with learning enabled (review_interval: 5). Ask it 10
questions about a subject you know well. After each answer, use agent_rate() to
provide a feedback score. After the 10th question, check agent_learning_stats().
Has the average reflection score improved over the session? What lessons did the
learning review extract?
Exercise 17.4: Prompt Evolution Experiment #
Create an agent with all cognitive features enabled and review_after: 10 (low
threshold for testing). Run 15 interactions with consistent feedback. Then:
- Check the prompt history with
agent_prompt_history(). - Compare the original and evolved prompts.
- Rollback to version 0 using
agent_rollback(). - Verify the agent uses the original prompt.
Document how the evolved prompt differs from the original.
Exercise 17.5: Autonomous Monitor #
Build an autonomous agent with:
- Goals related to a task you define (e.g., "Check weather and report temperature").
- A schedule of "every 30s" (for quick testing).
- A budget of max_daily_calls: 5.
Run the program and observe:
1. How often does the agent execute?
2. What happens when the budget limit is reached?
3. Use agent_pause() and agent_resume() to control execution.
Exercise 17.6: Full Cognitive Pipeline #
Design a customer support agent that combines all six cognitive capabilities. The agent should:
- Use
plan_and_executereasoning for complex tickets. - Reflect on every response with accuracy and tone evaluation.
- Learn from interaction history with
experience_replay. - Evolve its prompt, preserving the core identity: "You are a customer support agent."
- Run autonomously every hour to generate a summary of recent tickets.
- Operate within a daily budget of 200 calls and $20.
Write the complete Neam program, including at least 5 test interactions and cognitive status checks after each one.