How to prompt Claude (Sonnet, Opus, Haiku)
A practical guide to prompting Anthropic's Claude family. XML tags, system prompts, prefilling, and the patterns that get the most out of Sonnet, Opus, and Haiku.
A team migrates a customer-support bot from GPT-4o to Claude Sonnet. Same prompt, same inputs. The replies are immediately better — more natural paragraphing, better tone, fewer generic platitudes. But every reply now starts with "Here is a draft response for you:" followed by the actual content. The prompt explicitly forbade preambles. Claude kept adding them anyway.
The fix took 30 seconds: prefill the assistant turn with the first word of the actual reply. Suddenly Claude was producing flawless, preamble-free, on-brand responses — better than the GPT-4o version had ever been.
This is Claude in a nutshell. Higher quality output for almost every writing-heavy task —if you know the Claude-specific tricks. XML tags, prefilling, system prompts that carry more weight than they would on other models. None of this is exotic; all of it is undocumented if you don't know to look. This guide is the look.
The whole idea in one line
<document> or <input> tags.The mental model: Claude reads structure#
Different models reward different prompting patterns because they were trained on different data with different methods. Claude, in particular, was trained extensively with XML-style tagged content during its instruction-tuning phase. That training shows up in production: Claude is dramatically better at respecting section boundaries when you tag them.
Three properties make Claude prompts feel different from GPT prompts:
- XML tags are first-class. Where GPT-4o uses Markdown headers, Claude uses tags. Both work; tags work better on Claude.
- System prompts dominate. Claude weights system-prompt instructions more heavily than the user message. Long, detailed system prompts work well.
- Prefilling exists. The Anthropic API lets you start the assistant's response. Whatever you put there becomes the first tokens the model generates from. No other major frontier model lets you do this directly.
What Claude is genuinely good at#
- Long-form writing. Claude consistently produces more readable prose than other frontier models — better paragraphing, less repetition, more natural transitions. The default voice has the fewest tells.
- Code that humans want to read. Strong on correctness; even stronger on style. Tends to produce well-named variables, idiomatic patterns, and helpful comments without being asked.
- Following nuanced instructions. If your prompt has subtle constraints ("include a statistic but only if you can cite it"), Claude handles them more carefully than most models.
- Refusing gracefully on out-of-scope requests. More likely to ask a clarifying question than to invent an answer to an underspecified request.
- Long context (up to 200K). Less "lost in the middle" effect than GPT-4o — Claude attends more uniformly across long inputs, though Gemini still wins for truly massive contexts.
What Claude struggles with#
- Verbosity. Claude defaults to thoroughness. Without an explicit length constraint, expect more words than you asked for.
- Strict format compliance without prefill. Claude likes to add a polite preamble before a structured output. Use prefilling to skip the preamble.
- Aggressive numerical reasoning at scale. Pure arithmetic over many numbers — better suited to a tool call than to Claude in plain prompting mode.
- Speed-critical interactive use. Claude Sonnet is fast but Opus and extended thinking add latency. For real-time chat, Haiku or GPT-4o-mini may be better fits.
XML tags: the single biggest Claude trick#
Using XML-style tags in prompts dramatically improves how Claude parses your input and respects section boundaries. The tag names are arbitrary — <email>, <document>, <notes>, <rules> all work. The point is that Claude understands these as clear section boundaries.
Summarize the email below for our support team.
<email>
{{email_body}}
</email>
<format>
3 bullets, each starting with a verb. End with a single line:
"Action: <yes/no, one sentence>".
</format>
<example>
- Confirms shipment of order #2841.
- Asks whether expedited delivery is available.
- Reports an issue with the tracking link.
Action: yes, send tracking link and offer expedited upgrade.
</example>Compare this to running the same content as a wall of text. The tagged version produces more consistent format adherence and less leakage between sections.
Prefilling: lock in the output format#
Claude's API lets you start the assistant's response. Whatever you put there becomes the first tokens the model generates from. This is the cleanest way to suppress preambles and force structured outputs.
Use cases:
- Want JSON only? Set the assistant prefill to
{. Claude will continue from there without preamble. - Want a numbered list? Prefill
1.. - Want output to start with a specific word? Prefill that word.
- Want XML output? Prefill
<result>.
Prefilling is the most underused Claude feature in production prompts. Adoption costs nothing and lifts format compliance dramatically.
The system prompt does heavy lifting#
Claude pays close attention to the system prompt — more than GPT-4o does. Long, detailed system prompts work well. Put your role, your constraints, and your output format there; keep the user message focused on the task-specific content.
You are a senior customer support specialist at Acme. Voice and tone: - Warm, specific, and concrete. Never defensive. - 60-100 words per reply unless the customer's issue requires more. - If the customer mentions a refund, billing, or a bug, acknowledge it explicitly in the first sentence. Output format: - The reply only. No preamble. No "Here is the response:". - Sign with "— Acme Support". Hard rules: - Never quote a price; always defer to billing@acme.com. - Never promise a feature that hasn't shipped.
Extended thinking (reasoning mode)#
Claude Opus and Sonnet support extended thinking — the model produces internal reasoning tokens before its output. Enable it for hard reasoning, math, or multi-step planning. Skip it for simple writing or lookup tasks where the latency cost isn't worth it.
When extended thinking is enabled, you generally don't need to add Chain-of-Thought to your prompt — the model is already doing it internally. Adding "think step by step" on top of extended thinking is at best a no-op.
Picking the right Claude model#
Which Claude for which task
| If your situation is… | Reach for… | Why |
|---|---|---|
| High-quality writing, polished output, brand voice | Claude Sonnet | The default — strong quality at reasonable cost |
| Hard reasoning, complex code, multi-step planning | Claude Opus + extended thinking | Premium tier; the smartest model when reasoning matters |
| Cheap, fast workloads (basic chat, classification) | Claude Haiku | Fast and cheap; capable on simple tasks |
| Long-document analysis (50K-200K tokens) | Claude Sonnet | Better long-context attention than GPT-4o |
| Massive context (>200K tokens) | Consider Gemini Pro | Gemini's 1M+ context is purpose-built for this |
| Tool use / agent loops | Claude Sonnet | Strong tool-use API; good at structured planning |
Specific tactics that work on Claude#
- Use XML tags for everything structured. Inputs, outputs, examples, instructions.
- Prefill to skip preambles. Especially important for JSON, structured lists, or specific-starting-word outputs.
- Specify length in concrete numbers. "Concise" is fuzzy. "80-120 words" works.
- Use few-shot examples wrapped in
<example>tags. Claude treats the tagged examples as authoritative. - Ask Claude to think before responding on hard tasks, even without extended thinking enabled. Claude follows this instruction reliably.
- Combine prefill + tag prefix for structured outputs. Prefill
<analysis>as the assistant turn; Claude continues inside the tag and closes it cleanly.
Going further: production Claude patterns#
Tool use with structured input/output#
Claude's tool-use API is mature and well-documented. Pair it with XML-tagged tool descriptions for best results. For agents, see our agents section — Claude is widely considered the most reliable model for production agent loops.
Prompt caching for repeated context#
Anthropic's prompt caching lets you mark large, stable parts of your prompt (system prompt, knowledge base context, few-shot examples) for caching. Subsequent requests with the same cached prefix run faster and ~10× cheaper. Particularly valuable for RAG-style workloads with consistent system prompts and varying user queries.
Batch API for non-urgent workloads#
Like OpenAI, Anthropic offers a Batch API with ~50% discount for tasks that can wait up to 24 hours. Good fit for analytics, eval runs, and background processing.
Tuning extended thinking#
Extended thinking has a configurable token budget. Default is enough for most tasks; bump higher for genuinely hard problems. Keep in mind: more thinking ≠ better output past a threshold. Measure on your eval set rather than guessing.
Vision + XML for multimodal#
When prompting Claude with images, wrap the image-related instructions in XML tags too: <image_analysis_task>, <output_format>. The tag discipline carries over to multimodal prompts.
Things to avoid#
- Forgetting to constrain length. Claude will add helpful context you didn't ask for if you don't cap the output.
- Mixing instructions and inputs in plain text. Without delimiters, Claude can confuse user-supplied content with task instructions. Always wrap user content in tags. See prompt injection for the security angle.
- Asking Claude to be "funny" or "creative" without examples. Claude's creative defaults are tasteful but generic. Show what you want via examples.
- Skipping prefill on JSON-returning prompts. Claude will preamble before JSON unless you prefill. Prefill
{; problem solved.
Quick reference#
The 60-second summary
The Claude lineup: Sonnet (default), Opus (premium reasoning), Haiku (fast/cheap). Extended thinking optional on Sonnet/Opus.
The three Claude superpowers: XML tags for structure, prefilling for output control, system prompts that carry real weight.
Tactics: wrap user content in tags, use prefill for JSON/structured output, specify length in numbers, put examples in <example> tags.
Trade-off: better quality on writing-heavy tasks; pay more attention to verbosity and preambles.
What to read next#
For cross-model comparison, read prompting ChatGPT and prompting Gemini. For Claude-friendly techniques, start with Chain-of-Thought and few-shot. For multi-model production work, A/B testing prompts across models is non-negotiable.
Put this guide to work
Save your prompts, version every change, and share them with your team — free for up to 200 prompts.