ReAct: prompts that interleave reasoning and tool use

ReAct (Reason + Act) combines step-by-step reasoning with tool calls — search, code, lookups. Learn the loop, the prompt structure, and when ReAct beats a plain Chain-of-Thought.

schedule9 min readLast updated May 1, 2026

Ask GPT-4 in 2026: "What's the weather in Tokyo right now?" The answer is one of two things — a polite refusal, or a confident lie. The model has no way to look outside itself. Its training data is months or years old. It can't check.

Now give it one tool: a function called get_weather(city). Suddenly: "Let me check. [calls get_weather('Tokyo')] → Tokyo is currently 18°C and partly cloudy."

That's ReAct in its simplest form. ReAct (Reason + Act) is the prompt pattern that lets the model interleave thinking with action — calling tools, seeing results, deciding what to do next. It's the blueprint behind almost every modern LLM agent. If you're building anything beyond single-prompt-Q&A, you're building on ReAct.

The whole idea in one line

Loop: Thought → Action → Observation → Thought → … → Final Answer. Use it whenever the model needs to look things up, run code, or call APIs to answer the user's question.

The mental model: giving the model arms and eyes#

A model without tools is a brain in a jar. It can reason about the world but cannot perceive or act in it. Everything it "knows" is what was baked in during training, frozen as of the cutoff.

ReAct gives the model two new senses. It can see the current state of external systems (via tool returns) and it can act on them (via tool calls). The reasoning loop becomes:

Think about what you need to know.
Call a tool to find out (or do something).
Read the result.
Think about what to do next.
Repeat until you can answer.

It's how a human investigator works. Look up a fact. Cross-reference it. Look up another. Decide. Most autonomous agent behavior is a stylized version of this loop.

The Thought-Action-Observation loop#

Each iteration of a ReAct prompt produces three things:

Thought: what the model is reasoning about — what it knows, what it's missing, what to do next.
Action: a structured tool call — search("query"), calculate(2 + 2), get_user(id=123).
Observation: the result of running that action, supplied back to the model by your application code.

The model produces another Thought based on the new observation, decides on the next Action, and so on, until it produces a Final Answer instead of an Action.

A bare-bones ReAct prompt#

terminalPromptReAct shell

Claude / GPT-4o

You are an assistant that answers questions by reasoning and
using tools.

Available tools:
- search(query: string) → string  (web search)
- calculate(expr: string) → number (math)
- finish(answer: string)           (end the loop)

Format every response as either:
  Thought: <one-sentence reasoning>
  Action: <tool>(<args>)
or
  Thought: <one-sentence reasoning>
  Action: finish(<final answer>)

Wait for an Observation before continuing.

Question: {{question}}

play_arrowTry in PromptShip

Your application code parses each Action: line, runs the tool, formats the result as Observation: …, and feeds it back. Every modern agent framework — LangChain, LlamaIndex, the official tool-use APIs — implements this loop for you.

A real ReAct trace#

terminalPromptTrace — "What's the population of the capital of France?"

Claude

Question: What's the population of the capital of France?

Thought: I need to find the capital of France first, then look up its population.
Action: search("capital of France")
Observation: The capital of France is Paris.

Thought: Now I need the population of Paris.
Action: search("Paris population 2024")
Observation: Paris had an estimated population of 2.1 million in 2024.

Thought: I have the answer.
Action: finish("Paris, the capital of France, has a population of about 2.1 million as of 2024.")

Notice the structure: every Thought explains why the next Action makes sense; every Observation is real data fetched by your code, not invented.

Modern tool-use APIs are ReAct under the hood#

You don't typically write the raw Thought:/Action:/Observation: format anymore. OpenAI's function calling, Anthropic's tool use, and Gemini's function calling all wrap the same loop in a structured API: you declare tools as JSON schemas, and the model returns structured tool-call objects instead of free-text Action lines.

The structured API is more robust (no parsing errors), but conceptually it's ReAct. Knowing the underlying pattern helps you debug when an agent loops, hallucinates a tool call, or gives up too early.

When to reach for ReAct#

ReAct vs. alternatives

If your situation is…	Reach for…	Why
Answer requires fresh data the model can't have	ReAct	Tool calls fetch real-time data; pure prompting can't
Computation the model might botch (math, complex code)	ReAct (with calculator/code-exec tool)	Offload to a deterministic tool; model handles orchestration
Multi-system queries (look up X, then check Y based on X)	ReAct	Sequential dependencies need iterative tool calls
Side effects (booking, sending, modifying)	ReAct + confirmation tools	Tools expose the surface; preview/commit pattern adds safety
Reasoning over information already in the prompt	CoT (no tools needed)	No external state to fetch — ReAct adds complexity for nothing
Fixed pipeline of known steps	Prompt chain	Deterministic sequence is more reliable than dynamic looping
Single-document Q&A	Direct prompting	No retrieval needed — just include the doc

The four classic ReAct failure modes#

1. Infinite loops#

The model keeps calling tools and never decides to finish. Often happens when the task is genuinely impossible (no answer exists) and the model won't admit it. Fix: always set a max-iteration budget; on the final iteration, inject a system message that forces a finish.

2. Hallucinated tool names#

The model invents a tool that doesn't exist — get_weather_celsius when only get_weather is available. Fix: validate every Action against your tool registry before executing; on mismatch, return an Observation that lists valid tools.

3. Skipping the Thought#

The model jumps straight to Action with no reasoning. Quality drops fast — Thoughts are where planning happens. Fix: reject malformed responses and re-prompt; on structured APIs, keep temperature non-zero so reasoning surfaces.

4. Tool overload#

Giving the model 30 tools is worse than 5. The model spends attention reading tool descriptions and picks wrong tools more often. Fix: curate aggressively. For large catalogs, use a tool-router pattern: a first-pass classifier picks 5-7 relevant tools, then the agent runs with that subset.

Going further: production-grade ReAct patterns#

Planner-executor split#

Two models instead of one. A planner model breaks the goal into steps. An executor model carries each step out in a tighter ReAct loop. Better for complex tasks because the planner holds the goal and the executor handles tactics. See Introduction to agents for the broader architecture.

Confirmation patterns for side effects#

For tools that mutate state — sending messages, making purchases, deleting data — use a two-tool pattern: preview_X returns what would happen without doing it, then commit_X(preview_id) actually does it. Slows the agent slightly; prevents incidents. For high-stakes side effects, surface the preview to a human before commit.

Self-reflection#

After producing a Final Answer, run a second prompt that critiques the trace: did the agent miss anything obvious? Are the tool calls efficient? If the critique flags an issue, re-run the loop with the critique injected. Catches a class of errors pure ReAct misses.

Observability from day one#

Log every Thought, Action, Observation, and Final Answer. When agents misbehave in production, the trace is the only thing between you and a debugging mystery. Tools like LangSmith and Helicone (see our tools list) make this near-zero effort to set up.

Common mistakes#

Tool descriptions that don't describe. A tool named search with description "a search tool" is useless. Write descriptions that explain WHEN to use it.
No graceful degradation on tool errors. Return a structured error in the Observation — not a stack trace, not silent omission. The model treats it as another observation and can recover.
Ignoring observation length. A 50KB search result blows up the context window. Truncate or summarize observations before feeding them back.
Treating tool design as an afterthought. Most ReAct quality issues trace to tool design, not prompt design. See agent tools for the principles.

Quick reference#

The 60-second summary

What it is: the loop Thought → Action → Observation → repeat → Final Answer. Combines reasoning with tool calls.

What it solves: tasks that need fresh data, computation, multi-system queries, or side effects — anything pure prompting can't handle.

What to remember: set an iteration budget, validate tool names, keep Thoughts non-optional, curate tools to under 10 when possible.

The agent connection: almost every modern LLM agent is ReAct underneath, even when you're calling a structured tool-use API.

What to read next#

ReAct sits at the boundary between prompting and agents. To go deeper on the agent side, start with Introduction to agents, then Agent tools (the make-or-break of ReAct quality) and Agent memory (for multi-turn agents). For the foundational paper (Yao et al., 2022), see papers.

Put this guide to work

Save your prompts, version every change, and share them with your team — free for up to 200 prompts.

Start free Browse the prompt library