AC
Chapter 06Scaling Up

The writer/reviewer pattern

Two sessions: one writes, one reviews. Fresh context eliminates bias.

20 minLesson 3 of 4

Here is the biggest weakness of a single Claude session: it reviews its own work. The same context window that produced the code also evaluates it. The same reasoning that led to a decision also judges whether that decision was correct.

This creates blind spots. Claude is unlikely to catch its own systematic errors. If it misunderstood a requirement during implementation, it will misunderstand it again during review. If it made an assumption about how an API works, it will not question that assumption when verifying. The context is contaminated.

The fix is simple: use two sessions. One writes. One reviews. They never share context.

How it works

The pattern has three steps. Nothing fancy.

Session 1: Write

Give Claude the spec. Let it implement the feature. Full agentic loop — gather, plan, act, verify. When it is done, commit the changes.

Session 2: Review

Open a fresh Claude session. Point it at the changes. Ask it to review with specific criteria. It sees only the output — not the reasoning that produced it.

Fix and verify

Take the review findings back to the writer session (or a new session). Fix the issues. Run the build and tests. Ship.

The key insight is step 2. The reviewer has zero access to the writer's reasoning. It does not know why the writer chose a particular approach. It does not know what alternatives were considered and rejected. It sees only the diff. This is exactly what makes it effective — it evaluates the code on its own merits, not on the author's intentions.

Separate concerns into different context windows

This is a core ECC principle that extends far beyond code review. Every distinct concern should get its own context window.

The writer's context is optimized for implementation: understanding the spec, reading relevant files, making changes, running builds. Loading review criteria into that same context dilutes the implementation quality. And the implementation context dilutes the review quality.

Separation produces better results on both sides. The writer is a better writer when it is only writing. The reviewer is a better reviewer when it is only reviewing. Context specialization beats context mixing every time.

I apply this principle broadly:

  • Implementation and review — separate sessions
  • Analysis and writing — the analyzer reads code, the writer produces documentation
  • Planning and execution — the planner proposes, the executor implements

Each context window does one job well instead of two jobs poorly.

Confidence-filtered reviews

Not all review feedback is equal. A vague "this could be improved" wastes your time. A specific "this SQL query is vulnerable to injection because the input is not parameterized" saves your production database.

The confidence filter is an ECC pattern that eliminates noise. The reviewer only reports issues it is more than 80% confident about. Everything below that threshold gets dropped.

High-confidence review prompt
Review the changes in the last commit.

Focus on:
1. Security vulnerabilities
2. Missing error handling
3. Logic errors and edge cases
4. Performance issues with clear impact

Rules:
- Only report issues you are more than 80% confident about
- For each issue, state WHY it is a problem and HOW to fix it
- Do not report style preferences, naming opinions, or "could be improved" suggestions
- If you find nothing significant, say so. Do not invent issues.

Compare these two review outputs:

Low confidence — noise
- Consider renaming 'data' to something more descriptive
- This function could be split into smaller functions
- You might want to add a comment explaining this logic
- The variable name 'x' is not very clear
High confidence — signal
- SECURITY: The user input in line 34 is interpolated directly into the SQL
  query. This is vulnerable to SQL injection. Use parameterized queries instead.

- ERROR HANDLING: The API call on line 52 has no try/catch. If the external
  service is down, the entire request handler crashes with an unhandled
  promise rejection. Wrap it in try/catch and return a 503.

The first output is noise. The second output prevents production incidents. The only difference is the prompt.

Implementation options

You have three ways to run this pattern, from manual to fully automated.

Manual. The simplest version. Finish a feature in one session. Open a new terminal. Run a fresh claude session. Paste a review prompt. Read the findings. This takes 60 seconds of setup and catches real bugs.

Manual writer/reviewer
# Terminal 1: writer session
claude
> "Implement user settings page per the spec in TASKS.md"
> [Claude implements, commits]

# Terminal 2: reviewer session (fresh context)
claude
> "Review the changes in the last commit for security vulnerabilities,
   missing error handling, and edge cases. Only report issues you're
   more than 80% confident about."

Worktree-based. The reviewer runs in a separate worktree looking at the writer's branch. This gives the reviewer its own working directory, so it can even run the code independently.

Automated. A git hook or script triggers a review after every commit. The review runs in non-interactive mode. Findings get written to a file or posted as a comment. You review them at your own pace.

Automated post-commit review
#!/bin/bash
# .git/hooks/post-commit (or a separate script)

REVIEW=$(claude -p "Review the changes in the last commit.
  Focus on security, error handling, and logic errors.
  Only report issues with >80% confidence.
  Output as markdown." --allowedTools "Read,Glob,Grep")

if [ -n "$REVIEW" ]; then
  echo "$REVIEW" > .review-findings.md
  echo "Review findings written to .review-findings.md"
fi

The reviewer prompt matters

The difference between useful and useless review is the prompt. "Review this code" produces generic feedback that you will ignore. A specific prompt with clear criteria produces findings you will act on.

Here is what I include in every review prompt:

  1. What to look for. Security, error handling, performance, logic errors. Be explicit.
  2. What to ignore. Style, naming, formatting. The linter handles those.
  3. Confidence threshold. Only report if confident. No maybes.
  4. Action format. For each issue: what is wrong, why it matters, how to fix it.

The four-point structure turns a review from a chore into a checklist. Each finding is immediately actionable. No interpretation required.

Cost versus value

The review session costs extra tokens. A typical review of a medium feature costs maybe $0.10 to $0.30. That is the cost of one session reading a diff and analyzing it.

What does it save? Catching one security vulnerability before deploy saves hours of incident response. Catching one missing error handler saves a production crash and a 2 AM alert. Catching one logic error saves a bug report, a investigation, a fix, a review, and a deploy.

Start with manual reviews on your most important features. Automate it when you see the value. The pattern works at every level of sophistication — from a simple second session to a fully automated quality pipeline.