Prompt engineering glossary: 40+ terms, plain English

A no-jargon glossary of prompt engineering and prompt management terms — from "few-shot" to "tokenization" to "system prompt" — with a one-paragraph definition each and links to the relevant guides.

schedule12 min readLast updated May 1, 2026

You're reading a paper that says "applying CoT improves performance via in-context learning, particularly when paired with self-consistency." You know two of those four terms. You stop, open a tab, search for the third, get a 10-paragraph blog post that mentions concepts you also don't know. The whole paper takes an hour to read and half of it is term-lookup overhead.

This page is the antidote. Every prompt engineering term you'll trip over, defined in one paragraph. Cross-links lead to the deep guide for major concepts. Bookmark this page; it's a Cmd-F shortcut, not a read-through.

How to use this page

Cmd+F is your friend. Each term has a stable anchor link — copy the URL with the hash, share it in Slack when explaining a concept. The cross-references point to the deep guide for each major concept; the glossary is the entry point, not the destination.

Agent#

An LLM-powered system that operates in a loop, deciding what to do next based on the result of each step. Most modern agents follow the ReAct pattern: Thought → Action → Observation → repeat.

Chain-of-Thought (CoT)#

A prompting technique that asks the model to reason step by step before answering. Improves accuracy on math, logic, and multi-step tasks. The simplest version: append "Let's think step by step." See Chain-of-Thought prompting.

Completion#

The text the model generates in response to a prompt. Older OpenAI APIs called this a "completion"; modern chat APIs return a "message", but people still use both terms.

Context window#

The maximum number of tokens (input + output) the model can attend to in a single call. GPT-4o is around 128K; Claude is around 200K; Gemini Pro is 1M+. Larger isn't always better — quality usually drops on very long contexts.

CoT#

Abbreviation of Chain-of-Thought. Same thing.

Embedding#

A numerical vector representing a piece of text's meaning. Used for semantic search, similarity, and retrieval (RAG). Embeddings come from a separate model, not the LLM itself.

Few-shot prompting#

Including 2-5 input/output example pairs in the prompt so the model learns the pattern. Beats long instructions for format-sensitive tasks. See few-shot prompting.

Fine-tuning#

Updating a model's weights with additional training data. Different from prompting — fine-tuning changes the model itself. Often unnecessary; good prompting + RAG usually gets you 90% of the way.

Function calling / tool use#

The structured way modern APIs expose tools to LLMs. You declare functions as JSON schemas; the model returns a structured call object instead of free-text. Same concept underneath as ReAct.

Grounding#

Anchoring the model's output to real, verifiable sources — typically by injecting them into the prompt via RAG. Reduces hallucination, enables citations.

Hallucination#

When a model produces a confident answer that isn't true — fabricated facts, fake citations, invented features. Most common on out-of-distribution queries. Mitigated by retrieval, lower temperature, and explicit "say I don't know" instructions. See the hallucinations guide.

In-context learning (ICL)#

The phenomenon that lets models learn a task from examples in the prompt without weight updates. The academic name for what makes few-shot work.

Instruction tuning#

A training step that teaches a base model to follow instructions in natural language. Every modern frontier model is instruction-tuned. It's why "summarize this article" works without examples.

JSON mode / Structured Outputs#

API features that guarantee the model's response conforms to a JSON schema. More reliable than asking "respond in JSON" in the prompt, and the right default for any structured output.

LLM#

Large Language Model. The umbrella term for models like GPT-4o, Claude, and Gemini.

Multimodal#

Models that handle multiple kinds of input — text plus images, video, or audio. Modern frontier models are mostly multimodal by default.

Output parsing#

Extracting structured data from a model's free-text output. Brittle. Prefer Structured Outputs or JSON mode where available.

Persona prompt#

A system or user-message instruction telling the model who it is — "You are a senior engineer". Changes vocabulary, tone, and depth. See role prompting.

Prefilling#

Setting the first tokens of the model's response before it generates. Anthropic's API supports this directly; useful for forcing structured outputs and suppressing preambles.

Prompt#

The complete input you send to a model — system message, user message, examples, context, and instructions. "Prompt" is sometimes used loosely to mean just the user message.

Prompt chaining#

Splitting a complex task into multiple sequential prompts where each one's output feeds the next. More reliable than asking one prompt to do everything. See prompt chaining.

Prompt engineering#

The craft of writing prompts that get reliable, useful outputs from LLMs. The technique side of working with models — distinct from prompt management, which is about saving and sharing those prompts.

Prompt injection#

A security issue where user-supplied content overrides the system's instructions. Mitigations include wrapping user input in delimiters (XML tags, triple quotes), filtering inputs, and never trusting model output as authoritative. See prompt injection.

Prompt management#

The practice of saving, versioning, organizing, and sharing prompts at team scale. The discipline this whole "Learn" section is built around. See What is prompt management?

RAG (Retrieval-Augmented Generation)#

A pattern where you retrieve relevant documents from a corpus (usually via vector search), inject them into the prompt, then ask the model to answer using that context. Reduces hallucination and makes outputs citable. See RAG guide.

ReAct#

Reason + Act. A prompting pattern where the model interleaves reasoning steps with tool calls. The blueprint behind most modern agents. See ReAct.

Reasoning model#

A model that performs extended internal reasoning before producing an output — OpenAI o-series, Claude with extended thinking, Gemini with thinking mode. Slower and more expensive but better at hard reasoning tasks. Don't add "think step by step" to these — they already do.

Role prompting#

Same as persona prompting. See role prompting.

Self-consistency#

Running the same Chain-of-Thought prompt multiple times at temperature greater than 0 and taking the majority answer. Trades tokens for accuracy. See self-consistency.

Shot#

In prompting, a single input/output example. Zero-shot = no examples. Few-shot = a few examples.

Stop sequence#

A string that, when generated, makes the model stop outputting. Used to enforce output boundaries (e.g., stop after the first } for JSON).

Streaming#

Receiving the model's response token by token as it generates, instead of waiting for the full response. Better UX for long outputs.

System prompt#

The first message in a conversation, separate from the user message. Used for persistent instructions — persona, ground rules, output format — that should apply to every reply.

Temperature#

A sampling parameter (typically 0–1, sometimes higher) that controls randomness. Temperature 0 = deterministic. Higher = more creative/diverse but less predictable. Production defaults are usually 0.0–0.4 for structured tasks, 0.7+ for creative work.

Token#

The atomic unit a model sees. Roughly 0.75 words for English; can be a whole word or a subword. Token counts matter for context limits and pricing.

Tool use#

Same as function calling. The structured way to give the model the ability to call your code. See ReAct for the underlying pattern, and agent tools for design principles.

Top-p (nucleus sampling)#

A sampling parameter that limits the model to choosing from the smallest set of tokens whose probability sums to p. Often combined with temperature; most production systems set top-p to 0.9 or 1.0 and tune temperature instead.

Tree of Thoughts (ToT)#

A reasoning technique that explores multiple paths and backtracks from dead ends. Heavier than Chain-of-Thought but better on problems where wrong first steps cascade. See Tree of Thoughts.

Variable / placeholder#

A token like {{topic}} in a prompt template that gets replaced with a real value at run time. The fundamental building block of reusable prompts. See prompt variables.

Version (of a prompt)#

A snapshot of a prompt at a point in time. Versioning lets you roll back when an "improvement" makes outputs worse. See version control for prompts.

Zero-shot prompting#

Asking the model to do a task with no examples — just instructions. The default style for most ChatGPT interactions. See zero-shot prompting.

Quick reference#

The 60-second summary

What this is: a bookmarkable lookup, not a read-through. Cmd-F for the term you need.

Cross-links: every term linked to its deep guide. Glossary is the entry point; deep-dives are where you actually learn.

Sharing terms: each heading has a stable anchor URL — copy with the hash and drop into Slack when explaining a concept to a teammate.

What to read next#

If you're new to prompt engineering, start with What is prompt management? then walk the techniques in order: zero-shot → few-shot → Chain-of-Thought. If you're shipping production prompts, head to version control and team libraries.

Put this guide to work

Save your prompts, version every change, and share them with your team — free for up to 200 prompts.

Start free Browse the prompt library