The companion to What is an LLM, written for people who use Claude / ChatGPT / Cursor / Claude Code and want to know what’s going on behind the chat window. That post explained what the model is. This one explains what running one in production actually looks like, and it turns out almost every interesting decision is about the same thing: what tokens you put in front of the model on any given call. The model has no memory between calls. Whatever the model knows about your user, your codebase, your conversation, your tools, is in the prompt or it isn’t there at all. The post explains why Claude “remembers” your project (it doesn’t, the harness re-injects it), why ChatGPT degrades mid-conversation on long threads, why your $40 day on Claude Code happens, why “ignore previous instructions” still works on some agents in 2026. The technical detail (vector DBs, HNSW, chunking algorithms) is in sections you can skip if you’re not building one of these.