AC
Chapter 04Context Budget Management

Context is your #1 resource

Why the context window is the fundamental constraint. How to track usage.

25 minLesson 1 of 6

Everything in this course comes back to one constraint: the context window.

1 million tokens sounds like a lot. In practice, it fills up faster than you think. A single large file read can eat 2-3k tokens. A verbose build output can eat 5k. MCP tool descriptions consume thousands of tokens at session start. Run a few searches, read a dozen files, let Claude reason through a complex problem — and suddenly you are deeper into your budget than you expected. Worse, quality starts degrading well before you hit the hard limit.

I have had sessions where Claude's quality visibly degraded halfway through a feature. Not because the model got worse — because the context window was overloaded. The agent was trying to hold too much information at once, and the important details were drowning in noise.

This is the constraint that shapes every technique in this chapter. Context is not infinite. It is your most valuable resource, and you need to manage it like one.

What eats context

Not everything costs the same. Here is what fills your context window, ranked by impact.

MCP tool descriptions

Loaded at session start. Every MCP server registers its tools, and each tool comes with a description. 10 MCPs with 8 tools each = 80 tool descriptions eating context before you have typed a single word.

File reads

Every file Claude reads stays in context for the rest of the session (until compaction). A 500-line component is roughly 2-3k tokens. Read 20 files during research, and that is 40-60k tokens gone.

Command outputs

Verbose builds, long test runs, large git diffs — they all dump text into context. A failing TypeScript build with 50 errors can easily be 5-10k tokens.

Claude's own responses

Every word Claude writes stays in context too. Long explanations, detailed plans, verbose code comments — they all accumulate. This is why I prefer Claude to be concise.

CLAUDE.md and memory

Loaded every session. A well-structured CLAUDE.md is 1-2k tokens. Memory files add more. This is a fixed cost — you pay it whether you use the information or not.

The ranking matters. Most developers worry about their prompts being too long. In reality, your prompts are a tiny fraction of context consumption. The real budget killers are tool descriptions and file reads.

Strategic compaction

Claude Code has built-in compaction. When context reaches 95% capacity, it automatically compresses the conversation — summarizing earlier exchanges to free up space. This works, but 95% is too late.

By the time auto-compaction kicks in, Claude has already been operating in a degraded state. The model's attention is spread thin across 190k tokens of information, and the quality of its reasoning suffers.

The practical move: set CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=50 in your settings. This triggers compaction at 50% capacity instead of 95%. You lose some conversational detail earlier, but you gain a consistently sharp agent that is not struggling under context pressure.

Think of it like clearing your desk between tasks. You could wait until the desk is completely buried in papers, or you could file things away after each task. The second approach means you always have a clean workspace.

Advanced setups take this further with a strategic-compact skill that defines six specific phase transitions where compaction makes sense: after research completes, after a failed approach, after committing, after test runs, after debugging, and after shifting to a new task. The skill encodes not just when to compact but how — what state to preserve, what to let go, and how to brief the post-compaction session. You do not need all six today. But knowing that compaction can be systematic, not just reflexive, changes how you think about it.

~/.claude/settings.json
{
  "env": {
    "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "50"
  }
}

The MCP tool count problem

This one surprises most people. You install five MCP servers because they seem useful. Each server has 10-15 tools. Each tool has a name, description, and parameter schema. That is 50-75 tool descriptions loaded into context at the start of every session.

I measured this on my own setup. With Supabase, GitHub, Context7, and Playwright MCPs active, the baseline context consumption — before I typed anything — was over 15k tokens. That is 7.5% of my context window gone before the conversation starts.

For a typical Next.js + Supabase project, I run three MCPs: Supabase, GitHub, and Context7. That covers database operations, pull request management, and documentation lookup. Everything else stays disabled until I specifically need it.

Monitoring your usage

The /cost command shows your current token usage and spend. The number to watch is input tokens — that is your context consumption.

Run /cost at the start of a session to see your baseline overhead. Run it again after your research phase. Run it after implementation. You will quickly develop an intuition for what is expensive and what is cheap.

Session monitoring workflow
/cost                          # Baseline: ~15k tokens (MCPs + CLAUDE.md)
... research phase ...
/cost                          # After research: ~80k tokens
/compact                       # Reset before implementation
/cost                          # After compaction: ~25k tokens
... implementation phase ...
/cost                          # After implementation: ~60k tokens

The pattern I follow: compact after research, compact after debugging, compact before any new major task. Three compactions in a typical session keeps context clean and Claude sharp.

The mindset shift

Most developers approach AI tools with one question: "How do I get Claude to understand more?" They paste in entire files. They write long, detailed prompts explaining every decision. They install every MCP they can find.

This is backwards.

A well-structured CLAUDE.md with 50 lines gives Claude more useful information than a 500-line brain dump. A targeted file read of the relevant function gives Claude more useful context than reading the entire module. A compact summary of yesterday's debugging session is worth more than 30k tokens of raw conversation history.

The rest of this chapter teaches you four mechanisms for managing this budget: skills (on-demand knowledge), MCP (external tools), subagents (isolated context), and the decision framework for choosing between them. Every technique is about the same thing — getting maximum value from minimum context.