How to prompt ChatGPT (GPT-4o and GPT-4 Turbo)

A practical guide to prompting OpenAI's GPT-4o and GPT-4 Turbo: what they're good at, what they struggle with, and the structural moves that consistently improve outputs.

schedule8 min readLast updated May 1, 2026

Same prompt, three different OpenAI models, wildly different behavior. GPT-4o follows your six-rule constraint list to the letter. GPT-4 Turbo skips rules 4 and 5 about a third of the time. o1 ignores some constraints entirely because its internal reasoning decided they weren't important.

The OpenAI lineup isn't one model — it's a family with very different prompting personas. Treating them all the same wastes either money (running easy tasks on the expensive ones) or quality (running hard tasks on the cheap ones). Worse, the same prompt that wins on GPT-4o can actively hurt performance on the o-series. Same vendor; different beasts.

This guide covers the OpenAI lineup as of mid-2026 — what each model is good at, how each wants to be prompted, and the tactics that consistently lift outputs across the family.

The whole idea in one line

For GPT-4o: lean on system prompts, lead with the most important instruction, structure with Markdown. For o-series: write shorter prompts, skip Chain-of-Thought, trust the internal reasoning. Don't pay reasoning-model latency for tasks that don't need it.

The mental model: two model classes, not one#

OpenAI's 2026 lineup splits cleanly into two classes:

Conversational models (GPT-4o, GPT-4 Turbo, GPT-4o-mini). Trained to follow instructions, produce helpful output, fast inference, broadly capable. Standard prompt engineering applies.
Reasoning models (o1, o3, o3-mini). Trained to perform extended internal "thinking" before producing output. Higher latency, higher cost per task, dramatically better at hard reasoning. Different prompting playbook.

If you remember nothing else: prompts that work on conversational models often don't work on reasoning models, and vice versa. Pick the model class first; tune the prompt second.

What GPT-4o is genuinely good at#

Following structured instructions. GPT-4o adheres to numbered constraints and bulleted rules more reliably than its predecessors. If your prompt has 6 rules, it'll usually hit all 6.
Code generation and review. Strong across most popular languages; particularly good at Python, TypeScript, SQL.
Structured outputs. JSON, Markdown, XML — emits clean structured outputs without much fuss. Use the official Structured Outputs API for guaranteed schema compliance.
Function calling. Reliable tool use when you provide good tool descriptions. The structured tool-use API is mature.
Multimodal input. Strong image understanding; reliable for OCR, diagram reading, and visual classification tasks.

What GPT-4o struggles with#

Verbosity creep. By default, GPT-4o over-explains. Without explicit length constraints, it produces 3 paragraphs where 1 would do. Always specify length.
Sycophancy on judgment tasks. If you write "is X better than Y, I think X is", the model leans toward agreeing. Phrase questions neutrally — see biases.
Long-context degradation. Quality drops noticeably past ~30K tokens of context, even though the window is much larger. Prefer chained prompts over giant single contexts.
Strict format compliance without examples. For unusual output formats, zero-shot is unreliable. Show one or two few-shot examples or use Structured Outputs.
Hard reasoning that needs deliberation. The conversational models guess too quickly on multi-step reasoning. Use the o-series instead, or layer on Chain-of-Thought.

The prompt shape GPT-4o rewards#

Empirically, GPT-4o produces noticeably better outputs with this structure:

System prompt — persona, ground rules, constraints. Set once, persists across the conversation.
Task statement — one sentence describing what to do, ideally as the first line of the user message.
Constraints — bulleted, prefixed with action verbs ("Limit to", "Avoid", "Always").
Context — supplied via Markdown headers (## Background, ## Customer email) so the model can distinguish sections.
Output format spec — exactly what you want back, ideally with a JSON schema.

terminalPromptWell-structured GPT-4o prompt

GPT-4o

# Task
Summarize the customer email below for our support team.

# Constraints
- Output exactly 3 bullets.
- Each bullet starts with a verb (Confirms, Asks, Reports).
- End with one line: "Action: <yes/no>, <one-sentence next step>".
- Do not include any text outside this format.

# Email
"""
{{email_body}}
"""

# Output

play_arrowTry in PromptShip

o-series reasoning models are different#

OpenAI's o1, o3, and o3-mini are reasoning models — they do extended internal "thinking" before producing an answer. Prompting them well looks almost the opposite of GPT-4o:

Don't add "think step by step". They already do, internally. Adding it can confuse the model or be a no-op.
Skip few-shot for reasoning tasks. Few-shot examples sometimes hurt o-series performance on reasoning. Zero-shot with a clear spec usually wins.
Be terse. Long prompts can hurt more than help. State the task; supply only essential context.
Don't use system prompts (some o-series variants don't support them; check the API spec for your model).

Picking the right OpenAI model#

Which OpenAI model for which task

If your situation is…	Reach for…	Why
Standard chat, summarization, classification	GPT-4o	The default — fast, capable, well-tuned for instructions
High-volume cheap workloads (basic classification, simple extraction)	GPT-4o-mini	~10× cheaper; surprisingly capable for simple tasks
Hard reasoning (math, complex logic, multi-step planning)	o3 or o1	Internal CoT dramatically improves accuracy on hard problems
Reasoning at scale, latency-sensitive	o3-mini	Cheapest reasoning model; faster than full o-series
Code generation, code review	GPT-4o	Strong on Python/TS/SQL; tool-use API is mature
Structured outputs with strict schemas	GPT-4o + Structured Outputs API	Schema-enforced JSON; eliminates parsing errors
Image understanding (OCR, diagram reading)	GPT-4o	Strong vision capabilities natively
Long-context Q&A (>30K tokens)	Consider Gemini Pro instead	GPT-4o degrades on long context; Gemini is built for it

Pick the right model class for the job

o-series for hard reasoning, math, multi-step planning. GPT-4o for everything else — chat, generation, code, structured outputs. Don't pay for reasoning model latency on tasks that don't need it. Don't leave quality on the table by running hard reasoning on a conversational model.

Specific tactics that work well on GPT-4o#

Lead with the most important instruction. GPT-4o weights the start of the prompt slightly more than the end on long inputs.
Use Markdown for structure. Headers, code fences, tables — GPT-4o respects all of them and produces them cleanly in outputs.
Quote inputs in triple backticks or triple-quotes. Distinguishes user-provided content from instructions, reduces prompt-injection surface.
For JSON outputs, use Structured Outputs. Forget "respond only in JSON" instructions — the API guarantees schema compliance.
Use the developer message for non-system instructions. Some OpenAI models support adeveloper role between system and user — sits between persona-level rules and per-turn task instructions.

Going further: production OpenAI patterns#

Structured Outputs over prompt instructions#

For any task that returns JSON, use the Structured Outputs API instead of telling the model "respond only in JSON." The API enforces a JSON Schema at the decoding level — meaning the model literally cannot produce output that violates the schema. Eliminates an entire class of parsing bugs. Worth migrating every JSON-returning prompt.

Function calling + parallel tool calls#

GPT-4o can issue multiple tool calls in a single turn when it determines they're independent. Useful for agents that need to fan out queries (look up customer + look up account + look up plan, all in parallel). Doesn't work on every model in the family — verify support for the specific model version.

Cost-tier routing#

Run a cheap classifier (GPT-4o-mini) to triage incoming requests, then route hard ones to GPT-4o or o3 and easy ones to a cheaper handler. Most production traffic on a typical product is hittable by mini; the expensive path catches the edge cases. Often produces 5-10× cost savings with minimal quality loss.

Batch API for non-urgent workloads#

For workloads that don't need real-time responses (overnight summarization, periodic analysis, eval runs), the OpenAI Batch API offers ~50% discount in exchange for completion within 24 hours. Many production analytics pipelines fit this profile.

Streaming + UX patterns#

For interactive use, stream tokens to users as they generate. GPT-4o's streaming is well- supported across SDKs. Pair with optimistic UI that shows partial output immediately — perceived latency drops dramatically even when actual latency is unchanged.

Things to avoid#

Stacking 4+ roles in one prompt. "You are an expert engineer who is also a teacher and a writer" cancels out. See role prompting for what works.
Asking GPT-4o to "be honest if you don't know". The instruction does help, but only modestly. Pair it with retrieval or grounded context — see hallucinations.
Chaining 5+ instructions in one user message. Drop into prompt chaining instead.
Mixing reasoning-model and conversational-model prompts. Few-shot CoT on o3 wastes tokens. Plain CoT on GPT-4o without examples is mid. Tune the prompt to the model class.

Quick reference#

The 60-second summary

Two classes: conversational (GPT-4o, mini) and reasoning (o1, o3, o3-mini). Different prompting playbooks.

For GPT-4o: system prompt for persona, lead with the task, Markdown structure, Structured Outputs for JSON, explicit length constraints.

For o-series: short prompts, no "step by step," skip few-shot on reasoning, no system prompts on o1.

The key trade-off: pay for reasoning latency only on tasks that need it. Triage cheap tasks to mini.

What to read next#

For comparisons: prompting Claude and prompting Gemini. For the techniques GPT-4o handles best, few-shot and role prompting are the highest-leverage. To pick the best version of any ChatGPT prompt with data, see A/B testing prompts.

Put this guide to work

Save your prompts, version every change, and share them with your team — free for up to 200 prompts.

Start free Browse the prompt library