Cost awareness and model routing

Agentic coding is not free. A heavy Claude Code session can cost $5-20. Over a month of daily use, that adds up fast. Understanding token economics does not make you cheap — it makes you strategic about where you spend.

The goal is not to minimize cost. It is to spend on the things that matter and stop wasting on the things that do not.

The three model tiers

Claude comes in three tiers, and the price difference between them is dramatic.

Haiku

$0.25 / $1.25 per million tokens (input / output). The lightweight tier. Fast, cheap, good enough for formatting, simple edits, boilerplate generation, and search tasks. 12x cheaper than Sonnet on input.

Sonnet

$3 / $15 per million tokens. The workhorse. Handles standard development work: features, bug fixes, refactoring, code review. This is where you should spend most of your budget.

Opus

$15 / $75 per million tokens. The deep thinker. Architecture decisions, complex multi-file debugging, strategic planning. 5x more expensive than Sonnet. Use it only when the reasoning depth justifies the cost.

Let me put this in concrete terms. A typical research phase — reading 20 files, running some searches — costs about 50k input tokens. On Haiku, that is $0.01. On Sonnet, $0.15. On Opus, $0.75. Same task, 75x price difference between Haiku and Opus.

Most of that research does not need Opus-level reasoning. It is reading files and summarizing patterns. Haiku handles it fine.

Settings that control cost

There are four settings that have the biggest impact on your daily spend.

Default model

~/.claude/settings.json

{
  "model": "sonnet"
}

Default to Sonnet, not Opus. Sonnet handles 90% of development work without issue. When you hit a task that genuinely needs deeper reasoning — a complex architecture decision, a subtle concurrency bug, a multi-system integration plan — switch with /model opus for that task, then switch back.

Thinking budget cap

~/.claude/settings.json

{
  "env": {
    "MAX_THINKING_TOKENS": "10000"
  }
}

Extended thinking is powerful but expensive. Without a cap, Opus can spend 50k+ tokens "thinking" about a straightforward task. That is $3.75 in output tokens for thinking alone — on a task that Sonnet could have handled in 2k thinking tokens.

Setting MAX_THINKING_TOKENS=10000 gives Claude enough room for real reasoning while preventing runaway thinking on simple tasks. I have found that 10k thinking tokens is sufficient for nearly every task. The rare exception is deep architectural planning, where I temporarily raise the cap.

Subagent model

~/.claude/settings.json

{
  "env": {
    "CLAUDE_CODE_SUBAGENT_MODEL": "haiku"
  }
}

This routes subagent work to Haiku. Research tasks, code searches, file analysis — these tasks are about gathering and summarizing, not deep reasoning. Haiku handles them at a fraction of the cost.

A subagent that reads 30 files on Haiku costs about $0.02. The same task on Sonnet costs $0.24. On Opus, $1.20. Over a day of development with 5-10 subagent dispatches, that difference compounds fast.

Early compaction

~/.claude/settings.json

{
  "env": {
    "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "50"
  }
}

We covered this in Lesson 1, but it has a cost dimension too. Every token in your context window is re-read on every turn. A bloated 180k-token context means you are paying for 180k input tokens every time Claude responds. Compacting to 80k means you pay for 80k — more than 50% savings per turn.

The daily workflow

Here is the model-switching workflow I follow every day.

Daily model workflow

# Start of day — Sonnet is default
/model sonnet

# Research phase — dispatch to Haiku subagent
"Use a subagent to research how the payment flow works"

# Planning phase — switch to Opus for architecture
/model opus
"Based on that research, plan the refactor of the payment flow"
/model sonnet

# Implementation — back to Sonnet
"Implement the plan. Start with the schema changes."

# Debugging — Sonnet handles most bugs
"The tests are failing. Fix them."

# Complex bug — Opus for the hard ones
/model opus
"This race condition is subtle. Analyze the async flow."
/model sonnet

# End of day — check spend
/cost

The pattern: Sonnet as the baseline. Opus only for tasks where reasoning depth changes the outcome. Haiku for all subagent work. /cost to monitor.

What "too expensive" looks like

A $5/day habit for a developer using Claude Code is reasonable. That is $100/month for a tool that eliminates hours of tedious work daily.

A $50/day habit means something is wrong with your workflow. Common causes:

Opus on everything — using the most expensive model for tasks that do not need it
Context waste — not compacting, letting the context window fill up, paying to re-read 180k tokens every turn
No subagent delegation — doing all research in the main session at Sonnet/Opus prices instead of dispatching to Haiku
Runaway thinking — no thinking cap, letting Opus deliberate for 50k tokens on a simple task
Too many MCPs — 80+ tool descriptions loaded every session, inflating the baseline context

If your /cost output surprises you, check these five things first. In my experience, fixing just one of them — usually adding the thinking cap or routing subagents to Haiku — cuts daily spend by 40-60%.

The complete cost settings

Here is the full settings block that controls cost. You do not need all of these on day one, but this is what a well-tuned setup looks like.

~/.claude/settings.json

{
  "model": "sonnet",
  "env": {
    "MAX_THINKING_TOKENS": "10000",
    "CLAUDE_CODE_SUBAGENT_MODEL": "haiku",
    "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "50"
  }
}

Four lines. Default to Sonnet, cap thinking, route subagents to Haiku, compact early. This alone will cut your spend significantly compared to running Opus with defaults.

The point is not to be cheap. The point is to be intentional. Spend Opus-level money on Opus-level problems. Spend Haiku-level money on Haiku-level work. Let Sonnet handle everything in between.

Cost awareness and model routing

The three model tiers

Haiku

Sonnet

Opus

Settings that control cost

Default model

Thinking budget cap

Subagent model

Early compaction

The daily workflow

What "too expensive" looks like

The complete cost settings

Lesson Quiz