I spent the first year of building AI tools for clients convinced that prompt engineering was the craft. I read the books. I ran the frameworks. I wrote prompts with roles, tasks, constraints, examples, and output specifications. Every prompt looked like a spec document. And the output got better, noticeably.
Then I moved the same workflows into Claude Projects with a careful knowledge file, and the output got better again. This time not a little. A lot.
That was the moment I realized I had been optimizing the wrong variable. The prompt is a small fraction of what the model sees. Most of what shapes the output is the context the model is operating inside, and almost nobody is working on that deliberately.
This article is the case for why context engineering is the higher-leverage discipline, and a walk-through of how to actually build the context layer that makes every subsequent prompt better.
What the model actually sees
Let me pull apart the anatomy of what happens when you ask a modern AI tool to do something.
When you type "write me an email announcing our new pricing," here is what the model actually receives:
- A system prompt set by the tool (usually invisible to you), a few paragraphs of general instructions from the tool's operator.
- Whatever knowledge files you've uploaded to the project, typically tens to hundreds of pages of structured content.
- Any past messages in the current conversation.
- Your prompt itself, maybe 40 to 200 tokens in most cases.
In a tool like Claude Projects or a Custom GPT with a well-built knowledge file, the prompt is often less than 1% of the total tokens the model reads before it generates a response.
Think about that ratio for a second. If 99% of the input is context and 1% is prompt, which part do you think has more influence on the output?
It is not even close.
The asymmetric test
Here is the test I run in my head when I hear "I need to improve my prompt for X."
Experiment A: Take your current prompt. Change the wording significantly. Rerun. Compare. Experiment B: Keep your prompt exactly the same. Change what is in the context window (upload different documents, swap a different knowledge file, change the system prompt). Rerun. Compare.
In my experience, Experiment A produces modest, often marginal differences. Experiment B produces night-and-day differences in output quality.
That asymmetry is the whole argument. If changing the prompt moves the needle by 10% and changing the context moves it by 70%, you should be spending most of your AI-improvement effort on context.
Most teams spend it on prompts because prompts are what you see. Context is invisible, it sits quietly behind the tool, and it does 70% of the work.
Why context beats prompting, mechanically
The model is trained on general knowledge. A huge amount of it. When it sees your prompt, it is searching its own training for the best pattern match. Generic training wins by default, which is why generic input produces generic output.
Context is how you tell the model "use these specific things first." A knowledge file full of your real past emails, your real voice, your real audiences, your real past successes, biases the model away from generic training and toward your specific reality. The prompt on top of that context becomes a pointer, "do the X-shaped task against this specific reality," rather than a full spec the model has to interpret from scratch.
The concrete practical version: I can write a mediocre prompt against a well-built context library and get an output that sounds exactly like the company. I cannot write the most perfect prompt in the world against no context and produce the same result. The ceiling on prompt craft is bounded by what the model already knows. The ceiling on context engineering is bounded by how much of your reality you can put in front of the model.
What "good context" actually looks like
When I set up a team's AI workspace, the context library I build usually has ten sections. I put the full structured template in a Context Library Template resource you can copy. Here is the shape of what works:
The template linked above walks through each section with examples and prompts. The whole thing, filled in honestly, takes a few hours across a couple of sessions. It produces a lift in AI output quality that no amount of prompt craft will match.
The counterintuitive move: less prompt, more context
Teams that have never built a context library tend to write very long prompts. Each prompt re-establishes the role, the task, the constraints, the voice, the examples, all of it. The prompts end up 500 to 800 tokens because they are trying to compensate for the absent context.
Teams that have a real context library can shrink their prompts to 20 to 50 tokens. "Draft an announcement email for X." "Summarize this meeting in our standard format." "Write a LinkedIn post on Y." Short. Direct. The model does the rest because the context carries it.
This is one of the cleaner signals that your context is actually doing the work. Watch your prompt length trend downward over time. If you are still writing 400-word prompts after six months with the same AI workspace, your context is not pulling its weight.
Where teams should actually invest
If you have been spending AI-improvement budget on prompt engineering trainings, better prompting tools, or even AI vendors that sell "optimized prompt libraries," I would redirect almost all of that toward context engineering instead. Specific moves:
- Build one context library per team role that uses AI. Marketing, sales, ops, support, finance. Each one gets its own structured document with the sections above, filled in with real content from that function.
- Use persistent context tools over ad-hoc prompts. Claude Projects, Custom GPTs, or Gemini Gems all accept a knowledge file plus system instructions. Pick one per team, build the context, stop starting from scratch every time.
- Treat context as maintained infrastructure, not a one-time setup. Refresh the context library every three to six months. Businesses change. Stale context produces stale output even if the prompts are perfect.
- Audit the context layer first when output is mediocre. If an AI tool is producing generic output, the default first question should be "what specific context is missing or outdated," not "how do I rephrase the prompt."
The broader category
A lot of the conversation about "AI productivity" right now is about which model you pick, which platform you buy, which prompt technique you learn. Those are all surface-level levers. They move the needle by small amounts.
The thing that moves the needle by large amounts is almost always the quality of the specific, curated, structured context you feed the model about your business. The Anatomy of a Great Prompt covers the prompt-craft side, and it still matters, a great prompt against great context is better than a mediocre prompt against great context. But the multiplier you get from doing the prompt work alone is a small fraction of the multiplier you get from doing the context work.
If you had to pick one lever to invest in, pick context. It is where the real work is, and it is where almost nobody is looking.
Next read: The Anatomy of a Great Prompt for the prompt-craft side of the same problem. Context is the bigger lever, but once you have it, prompt craft becomes the next compounding improvement.
Want to build your own context library? The Context Library Template is the 10-section structured doc I use when I set one up for a team. Free.
Or if you want this set up for your team end to end, that is what LLM Setup & Context Engineering is for.