Back to Learn
Beginner · 101· 6 min read

What Is a Large Language Model (LLM)? A Plain-English Explainer

An LLM is the core technology behind Claude, ChatGPT, and Gemini. This is how they actually work, why they sometimes make things up, and what that means for how to use them well.

If you've used Claude, ChatGPT, or Gemini, you've used an LLM, a Large Language Model. Understanding what they actually do under the hood turns out to be useful, because it explains why they're great at some things, bad at others, and occasionally confidently wrong.

No math. No ML jargon. Just what's actually happening.


The Core Mechanic: Predicting the Next Word

An LLM has one basic job: given a sequence of words so far, predict what word should come next.

That's it. That's the whole trick.

You type a prompt. The model predicts the most likely first word of a response, writes it, then uses everything it's seen so far (your prompt + that first word) to predict the next word. Then the next. Then the next. One piece at a time, until it decides the response is done.

Trillions
Words in training
Modern LLMs are trained on enormous corpora: books, articles, code, websites
Billions
Parameters
The internal weights the model adjusts during training, this is where knowledge lives
200K–2M
Context window
The maximum text the model can consider at once, measured in tokens

The "magic" comes from scale. Train a model large enough, on enough text, and "predict the next word" turns out to require a remarkably broad understanding of language, logic, and the world. The model has to know grammar, vocabulary, facts, conventions, styles, reasoning patterns, all of it, to predict the right next word across every domain it's ever seen.


Tokens: The Actual Units

One piece of vocabulary worth learning: tokens.

LLMs don't think in words exactly. They think in tokens, chunks of text that are usually about 3/4 of a word on average. Some words are one token, some long words are two or three.

Why this matters for you:

  • Pricing is per token, so longer prompts and longer responses cost more
  • Context windows are measured in tokens, a 200K context window holds roughly 150,000 words of input + output combined
  • Rate limits are usually expressed as tokens per minute

Practically: 1,000 tokens is roughly 750 words. A single-page document is ~500 tokens. A 50-page PDF is ~25,000 tokens.


Why LLMs Sometimes Make Things Up

The technical term is hallucination. It happens when an LLM produces something that sounds right but isn't factually accurate, made-up citations, fabricated quotes, invented statistics, non-existent features.

Here's why this isn't a bug you can fully patch:

The model isn't looking things up. It's predicting what sounds plausible based on patterns in its training data. When you ask "What's the name of the CEO of [small company]?" and the answer wasn't well-represented in training, the model doesn't have a way to say "I don't know what's probable here." It produces its best guess and presents it confidently.

Common knowledge (the sky is blue)
Well-represented in training
Very high accuracy
Broadly documented facts
Lots of training data, consistent sources
High accuracy, occasional errors
Recent events
May be post-training-cutoff
Unreliable, use web-connected AI (Perplexity, ChatGPT browsing)
Specific niche facts
Sparse or contradictory training data
Very likely to hallucinate
Things it was never exposed to
Your internal docs, private data
Will confidently invent plausible answers

The practical rule: never trust an LLM's factual output about something specific without verification. The model is wildly useful for reasoning, drafting, and pattern-matching. It is an unreliable source of facts unless you've given it the source material directly (see what is RAG).


Context Windows: How Much the Model Can "See"

The context window is the maximum amount of text the model can process in a single request. This includes your prompt, any documents you've attached, the conversation history, and the generated response, all of it combined.

Current state of the art:

Claude Sonnet 4.5
200K tokens standard, 1M tokens in the experimental tier. Roughly 150,000 words, comfortably fits a long book or an entire codebase.
GPT-4 / GPT-5
128K tokens. Roughly 96,000 words. Enough for most business documents but not entire books.
Gemini 2.5 Pro
2M tokens. The largest in production. Can handle entire document libraries, full codebases, or video transcripts at scale.

Larger context windows enable capabilities that weren't practical a year ago, you can paste an entire repository and ask questions, or dump a 40-hour transcript archive and ask for patterns. But larger isn't automatically better. Bigger context means slower inference and higher cost. Use the window you actually need.


What's Actually Happening When You Chat With One

A useful mental model:

  1. You type a prompt.
  2. The model reads your prompt + any system instructions + any attached knowledge, all together.
  3. The model generates a response token-by-token, predicting each one based on everything it's seen so far.
  4. When you reply, the model reads the entire conversation again from the top, it has no memory between messages besides what's visible in the context window.
  5. Once the conversation exceeds the context window, the earliest messages get dropped. The model forgets them.

This is why a good system prompt (see how to write a prompt) matters so much: it's always in the model's view. A throwaway detail you mentioned 20 messages ago is probably gone.


Related Reading

Not sure which LLM to pick?
See /learn/ai-tool-picker for a Claude/ChatGPT/Gemini comparison.
Want to get better at prompting?
Start with /learn/how-to-write-a-prompt for the beginner framework.
Confused about RAG?
Read /learn/what-is-rag for how LLMs can 'look things up' in your documents.
What's the difference between a chatbot and an agent?
Read /learn/what-is-an-ai-agent, shorter than you'd think.

Ready to put this into practice? The Claude Project Starter Pack has 6 role-specific configurations you can paste and use immediately.