Agent Fundamentals¶

1. What Is an LLM Agent?¶

An LLM agent is a system that uses a language model as its reasoning engine to autonomously plan and execute multi-step tasks by choosing and calling external tools, observing their results, and deciding on subsequent actions — rather than just generating a single text response.

The key distinction from a plain LLM call:

Plain LLM	LLM Agent
Single input → single output	Iterative: reason → act → observe → reason
No external state	Maintains state across steps
No tool access	Calls tools (APIs, search, code, databases)
Deterministic flow	Plans dynamically at runtime

An agent can be thought of as a loop:

while not done:
    thought = LLM.reason(current_state, history, tools)
    if thought.requires_tool:
        result = tool.call(thought.tool_name, thought.args)
        history.append(result)
    else:
        return thought.final_answer

2. Core Components of an Agent¶

Every LLM agent has five logical components:

2.1 Brain (LLM)¶

The language model that performs reasoning, planning, and decision-making. It decides what to do next at each step.

2.2 Memory¶

State maintained across steps. May include:

In-context (working) memory — the current conversation/scratchpad
External memory — a vector store or database for long-term recall (See Memory Systems for full taxonomy.)

2.3 Tools¶

External capabilities the LLM can invoke. Examples:

Web search (Tavily, Brave Search API)
Code execution (Python REPL, sandboxed environments)
Database queries (SQL, vector stores)
API calls (weather, calendar, email)
File read/write

2.4 Planning¶

The strategy for decomposing a goal into steps. May be reactive (ReAct) or deliberative (MCTS, hierarchical). (See Planning for full coverage.)

2.5 Action Space¶

The set of allowed actions (tools) the agent can take. Defines the scope of what the agent can do and is the primary safety control point.

3. Tool Use and Function Calling¶

Function Calling (Structured Tool Use)¶

Modern LLMs natively support function calling: given a set of function definitions (in JSON schema), the model can emit a structured call instead of plain text.

Example schema passed to the model:

{
  "name": "search_web",
  "description": "Search the web for recent information",
  "parameters": {
    "type": "object",
    "properties": {
      "query": {"type": "string", "description": "Search query"},
      "max_results": {"type": "integer", "default": 5}
    },
    "required": ["query"]
  }
}

Model output (instead of plain text):

{
  "tool_call": {
    "name": "search_web",
    "arguments": {"query": "RAG latest benchmarks 2025", "max_results": 3}
  }
}

The calling framework executes the function and returns the result back to the model as a tool message.

Key Design Decisions for Tool Schemas¶

Names must be descriptive — the model uses the name and description to decide when to call the tool.
Avoid overlap — two tools with similar descriptions cause the model to choose incorrectly.
Required vs optional parameters — only mark parameters required if the tool genuinely cannot run without them.
Return format — keep tool returns concise and structured; avoid dumping 10KB of raw HTML.

4. The ReAct Pattern¶

ReAct (Reason + Act) (Yao et al., 2022) is the foundational pattern for most production agents. The model interleaves Thought, Action, and Observation steps before producing a final answer.

Thought: I need to find the current CEO of OpenAI.
Action: search_web("current CEO of OpenAI 2025")
Observation: Sam Altman is the CEO of OpenAI as of 2025.
Thought: I have the answer.
Answer: Sam Altman.

Why ReAct works:

The Thought step forces the model to reason explicitly before acting (reduces impulsive wrong tool calls).
Observations ground the model in real retrieved information, reducing hallucination.
The loop allows self-correction: if an observation is unexpected, the next thought can adapt.

Limitations of vanilla ReAct:

Linear chain — no backtracking if a tool call fails.
No exploration — takes the first plausible path.
Prone to getting stuck in tool-call loops.

5. Agent Types by Autonomy¶

Type	Description	Example
Single-step	One LLM call + one tool call	Simple QA with web search
Multi-step (ReAct)	Iterative reason-act-observe loop	Research agent
Plan-and-execute	First generate a full plan, then execute steps	Complex report generation
Multi-agent	Multiple specialized agents collaborating	Orchestrator + researcher + writer
Autonomous (long-horizon)	Runs for hours/days with minimal human input	Coding agent on a multi-day task

6. Agentic vs. Non-Agentic RAG¶

Dimension	Standard RAG	Agentic RAG
Retrieval decisions	Fixed: always retrieve	Dynamic: agent decides if/when to retrieve
Query strategy	Single query	Multi-query, iterative, decomposed
Verification	None	Agent verifies, retries if needed
Tool access	Only retrieval	Retrieval + search + code + APIs
Multi-hop	Not supported natively	Supported via iterative retrieval

7. Production Agent Architecture¶

A production agent is more than a loop. It includes:

┌─────────────────────────────────────────────────────┐
│                    Orchestrator                       │
│  ┌──────────┐  ┌──────────┐  ┌────────────────────┐ │
│  │  Planner │  │  Memory  │  │   Tool Registry    │ │
│  │  (LLM)   │  │ (vector  │  │  - search          │ │
│  │          │  │  store + │  │  - code_exec       │ │
│  │          │  │  episodic│  │  - db_query        │ │
│  └──────────┘  └──────────┘  │  - file_io         │ │
│                               └────────────────────┘ │
│  ┌───────────────────────────────────────────────┐   │
│  │              Guardrails Layer                  │   │
│  │  Input validation | Output filtering           │   │
│  │  Budget limits | Safety classifiers            │   │
│  └───────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────┘