The context window

If you take one concept from this entire chapter, let it be this one. The context window is the single most important constraint in agentic coding. Every design decision in this course — CLAUDE.md structure, skills, hooks, subagents, compaction strategies — exists because of this one limitation.

What the context window actually is

The context window is Claude's working memory. Everything Claude knows about your current session lives in this buffer: the system prompt, your CLAUDE.md, every file it has read, every command it has run, every message you have sent, every response it has generated, and all its internal reasoning.

For Claude Opus 4.6 and Sonnet 4.6, this buffer is 1 million tokens. That sounds like a lot. It is not.

The context window is not storage. It is RAM. Everything in it is active, available, and consuming space. There is no "disk" to swap to, no way to page things in and out. When the window fills up, something has to go.

What consumes context

Here is what fills your context window in a typical session, roughly in order of cost:

Loaded at session start (before you type anything):

System prompt and tool descriptions (all built-in tools + all MCP tools)
Your CLAUDE.md file (and any project-level CLAUDE.md files)
Initial session context

Accumulated during the session:

Every file Claude reads (full content of each file)
Every Bash command output (build logs, test results, git diffs)
Every message you send
Every response Claude generates
Extended thinking (internal reasoning tokens — these count)
Web search and fetch results

Rough token costs

CLAUDE.md (well-structured)     ~800-1500 tokens
MCP tool descriptions (3 MCPs)  ~2000-4000 tokens
Reading a 200-line file         ~1000-1500 tokens
Reading a 500-line file         ~3000-4000 tokens
Build output (success)          ~200-500 tokens
Build output (failure + errors) ~1000-3000 tokens
A typical Claude response       ~500-2000 tokens
Extended thinking per turn      ~2000-10000 tokens

Add it up. Ten file reads, five bash commands, a few back-and-forth messages, and you are at 50k-80k tokens. That is 25-40% of your budget, and you might not even be halfway through the task.

200K tokens~50 messages

A typical context window fills up faster than you think

What happens when it fills up

When the context window approaches capacity, Claude Code triggers auto-compaction. This is not a crash or an error — it is a summarization step. Claude takes the older parts of the conversation and compresses them into a summary, freeing space for new work.

The problem: summaries lose detail. That function signature Claude read 20 minutes ago? After compaction, it might remember "read a function related to authentication" but not the exact parameter types. That build error from earlier? Summarized to "there was a type error that was fixed."

You have seen this yourself even if you did not know the cause. Claude works great for the first 15 minutes, then starts making small mistakes — using the wrong variable name, forgetting a pattern it followed earlier, suggesting an approach it already tried and rejected. That is context degradation.

Strategic compaction

The default behavior is auto-compaction at approximately 95% capacity. By that point, you have been running on a nearly full context for a while, and the compaction has a lot of material to summarize — which means more information loss.

The better approach is proactive compaction. Use the /compact command at logical breakpoints in your workflow:

After research, before implementation. You spent 20 minutes reading code and understanding the problem. Compact. The summary preserves the conclusions while freeing the space consumed by all those file reads.
After debugging, before the next feature. You just spent a session chasing a bug. The fix is committed. Compact. The next task starts with a clean context.
After a failed approach. Claude went down the wrong path. You course-corrected. Compact. No need to carry the history of the failed attempt.

~/.claude/settings.json

{
  "env": {
    "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "50"
  }
}

Setting CLAUDE_AUTOCOMPACT_PCT_OVERRIDE to 50 triggers auto-compaction at 50% instead of 95%. This is aggressive, but it means Claude always has room to work. I run this setting on projects where tasks tend to be long and research-heavy.

In a fully configured system, you do not even need to remember to compact. A suggest-compact hook counts tool calls and reminds you every 50 — "You have made 50 tool calls since last compaction. Consider running /compact." A pre-compact hook saves session state before compaction so nothing critical is lost. A strategic-compact skill encodes when and how to compact — with six defined phase transitions where compaction makes sense.

You do not need to remember to compact. You build a system that remembers for you.

The cost of not managing context

This is where the stakes become concrete. Here is what a typical session looks like with and without context management:

Typical session (no optimization)

Optimized session (configured system)

CLAUDE.md: 2000+ tokens (everything in one file)

CLAUDE.md: 800 tokens (lean, with skills for the rest)

10+ MCPs loaded (just in case)

3 MCPs loaded (only what this project needs)

No compaction until 95% (auto, lossy)

Proactive compaction at 50% (frequent, controlled)

Unlimited thinking tokens (burns context silently)

Thinking capped at 10K (70% reduction, no quality loss)

Subagent dumps raw results (2000+ tokens each)

Subagents return 200-token summaries (90% reduction)

~40 min2+ hours

Effective session duration with vs without context management

That is not a marginal improvement. That is the difference between Claude degrading mid-task and Claude finishing complex multi-file features in a single session. Every technique in this course — CLAUDE.md structure, skills, hooks, subagents — is a different lever for pushing that number higher.

Token optimization

Beyond compaction, there are settings that reduce how many tokens Claude consumes in the first place.

Thinking budget cap. Claude's extended thinking (the internal reasoning before each response) can consume thousands of tokens per turn. Most of the time, you do not need maximum reasoning depth for routine edits.

~/.claude/settings.json

{
  "env": {
    "MAX_THINKING_TOKENS": "10000"
  }
}

Setting MAX_THINKING_TOKENS to 10,000 caps the hidden thinking cost. In practice, this reduces thinking token usage by roughly 70% without noticeably affecting output quality for standard tasks. For complex architectural decisions, you can temporarily remove the cap.

Subagent model routing. When Claude spawns a subagent for a research task — reading files, searching code, summarizing findings — that subagent does not need the most powerful model. Routing subagents to a smaller, faster model saves tokens and money.

These settings go in ~/.claude/settings.json and apply globally. You set them once and forget about them until you have a specific reason to change them.

The cascade effect

Here is why context management matters so much: every decision compounds.

A bloated CLAUDE.md (2000 tokens instead of 800) means less room for file reads. Fewer file reads means Claude gathers less context. Less context means worse plans. Worse plans mean more verification failures. More failures mean more loop iterations. More iterations consume more context. The window fills faster. Compaction happens sooner. Information is lost. Quality drops further.

The reverse is also true. A lean CLAUDE.md, efficient tool usage, strategic compaction, and thinking caps create a virtuous cycle where Claude has more room to work, makes better decisions, and finishes tasks faster.

Monitoring in real time

The /cost command shows you exactly where your tokens are going. Run it at any point during a session to see:

Input tokens consumed (everything Claude has read)
Output tokens generated (everything Claude has written)
Total cost for the session
Breakdown by tool category

I run /cost periodically during long sessions — usually after completing a subtask or before starting a new one. If the numbers are climbing fast, I compact. If they are low, I know I have room for a complex task.

This is not about penny-pinching. It is about situational awareness. Knowing your context budget is like knowing your battery percentage. You make different decisions at 80% than at 15%.

The design principle

Every chapter that follows teaches techniques that are, at their core, context management strategies:

CLAUDE.md (Chapter 2): front-load the right information so Claude does not have to go searching for it — your project's rules, boundaries, and patterns loaded once, used every session
Specs and plans (Chapter 3): give Claude structured intent so it gathers the right context and acts with precision — the human-in-the-loop patterns that prevent wasted iterations
Skills (Chapter 4): load specialized knowledge only when needed, instead of keeping it in CLAUDE.md permanently — a TDD workflow that costs zero tokens until you invoke it
Subagents (Chapter 4): isolate expensive research tasks so they do not pollute the main context — read 20 files in a separate window, return a 200-token summary
Hooks (Chapter 5): run verification outside of Claude's context entirely — deterministic scripts that cost zero tokens and catch problems the model might miss
Worktrees (Chapter 6): run multiple Claude sessions in parallel, each with its own fresh context window — true parallelism for complex features

Every single one of these is a different answer to the same question: how do we get the most value out of our context budget?

Chapter 1 gave you the map. What you do not yet have is the system.

The developers who finish this course do not write better prompts. They build better systems. They have CLAUDE.md files that front-load the right context. Skills that load domain knowledge on demand. Hooks that enforce rules without consuming a single token. Subagents that research in isolation. Verification pipelines that catch mistakes automatically.

And the system starts with one file: CLAUDE.md.

Chapter 2 begins where every session begins: with your codebase.