vibedonaldsvibedonalds.com
Term

Context window

The maximum number of tokens an LLM can attend to in a single inference call — both the prompt and the generated output count against it. As of 2026, frontier models range from 200k tokens (GPT-5) to 1M+ tokens (Gemini 2.5, Claude Sonnet 4.6 with 1M extension).

Background

The context window is a hard limit on how much text the model can "see" at once. Larger windows let agents read more of a codebase, include longer conversation histories, and process larger documents. But context is not free: cost scales linearly with input tokens, and quality often degrades on retrieval tasks far inside a very large prompt ("lost in the middle"). Coding agents use techniques like sliding-window summarisation, file-level chunking, and RAG to stay within budget.