Keyboard shortcuts

Press or to navigate between chapters

Press ? to show this help

Press Esc to hide this help

Chapter 26: Context Management as a Core Competency

Why This Matters

If you had to pick the single most underrated subsystem from Claude Code's entire codebase, it would be context management. The permission system is eye-catching, the Agent Loop is the core, and prompt engineering is widely known — but context management is the key infrastructure that determines whether an AI Agent can "continue working effectively."

A 200K token context window seems generous, but in real-world scenarios it's consumed faster than you'd expect: system prompts take about 15-20K, each tool call result takes 5-50K, and after a few rounds of file reads and code searches, you've already used half. More critically, the context window is not just a "capacity" issue — it's an "information density" issue. When the window is filled with stale tool results, redundant file contents, and resolved discussions, the model's attention is diluted and response quality degrades.

The context management system analyzed in Part Three (Chapters 9-12) reveals 6 core principles, with a common theme: the context window is a scarce resource that must be managed as carefully as memory.


Source Code Analysis

26.1 Principle One: Budget Everything

Definition: Every piece of content entering the context window must have a clear token budget cap, no exceptions.

Claude Code's budget system covers every content source in the context window:

Content SourceBudget LimitSource Location
Single tool result50K charactersrestored-src/src/constants/toolLimits.ts:13
All tool results in a single message200K charactersrestored-src/src/constants/toolLimits.ts:49
File readDefault 2000 lines + offset/limit progressive readingSee Chapter 8 for details
Skill listing1% of context windowrestored-src/src/tools/SkillTool/prompt.ts:20-23
Post-compaction file restorationMax 5 files, 5K tokens per file, 50K totalrestored-src/src/services/compact/compact.ts:122
Post-compaction skill restoration5K tokens per skill, 25K totalSee Chapter 10 for details
Agent description listMoved to attachments to control main prompt sizeSee Chapter 15 for details

Table 26-1: Claude Code's token budget system

Note the granularity of the design: there's not just a "total budget," but also "per-item budgets." The sources for both:

// restored-src/src/constants/toolLimits.ts:13
export const DEFAULT_MAX_RESULT_SIZE_CHARS = 50_000

// restored-src/src/constants/toolLimits.ts:49
export const MAX_TOOL_RESULTS_PER_MESSAGE_CHARS = 200_000

MAX_TOOL_RESULTS_PER_MESSAGE_CHARS = 200_000 prevents N parallel tools from simultaneously returning large results and flooding the context — even if each tool result is within 50K, 10 parallel tools could produce 500K characters. The per-message budget is a safeguard against this "legitimate but dangerous" combination.

The 1% budget for the skill listing is particularly noteworthy:

// restored-src/src/tools/SkillTool/prompt.ts:20-23
// Skill listing gets 1% of the context window (in characters)
export const SKILL_BUDGET_CONTEXT_PERCENT = 0.01
export const CHARS_PER_TOKEN = 4
export const DEFAULT_CHAR_BUDGET = 8_000 // Fallback: 1% of 200k × 4

As users install more and more skills, the skill listing could grow without bound. Claude Code's solution is a three-level truncation cascade: first truncate descriptions (MAX_LISTING_DESC_CHARS = 250), then truncate low-priority skills, and finally retain only the names of built-in skills. This ensures the skill listing never occupies more than 1% of the context window — even if the user has installed 1,000 skills.

Anti-pattern: Unbounded content injection. Injecting tool results, file contents, or configuration information into the context window without limits, ultimately filling the context with low-information-density content.


26.2 Principle Two: Context Hygiene

Definition: Context management is not just about compressing content already in the window, but about proactively filtering out high-cost information irrelevant to the current agent's goals before injection.

Claude Code implements this principle thoroughly within its sub-agent system. Read-only agents like Explore / Plan don't inherit the main agent's complete control plane: runAgent() proactively omits CLAUDE.md hierarchical directives when conditions are met, and further removes gitStatus for Explore / Plan. The reason is not that this information is never useful, but that it's typically dead weight for read-only search agents: commit conventions, PR rules, and lint constraints only need to be interpreted by the main agent; stale git status may take up tens of KB but can't help with code searching.

More importantly, this trimming happens when generating the sub-agent's context, not as a compression fix-up after token pressure builds. Standard sub-agents isolate search noise within their own conversation, returning only a condensed result to the parent; Explore even defaults to omitClaudeMd: true. This is the essence of "context hygiene": don't let low-information-density content enter the window first, then hope the compression system cleans up afterward.

Anti-pattern: Full inheritance. Stuffing every helper agent with the complete system prompt, CLAUDE.md, git status, recent tool outputs, and user preferences, resulting in every read-only query redundantly paying for an expensive prefix.


26.3 Principle Three: Preserve What Matters

Definition: Compaction is necessary, but post-compaction must selectively restore the most critical context.

Auto-compaction (see Chapter 9 for details) compresses the entire conversation history into a summary, freeing context space. But compaction loses specific code content, file paths, and precise line number references. If the model completely loses the contents of files it previously read after compaction, it needs to re-read them, wasting tool calls and user wait time.

Claude Code's solution is post-compaction restoration (see Chapter 10 for details):

// restored-src/src/services/compact/compact.ts:122
export const POST_COMPACT_MAX_FILES_TO_RESTORE = 5

The restoration strategy flow:

graph LR
    A["Pre-compaction snapshot<br/>cacheToObject()"] --> B["Execute compaction<br/>conversation→summary"]
    B --> C["Selective restoration"]
    C --> D["Most recent 5 files<br/>single file ≤5K tokens"]
    C --> E["Total budget ≤50K tokens"]
    C --> F["Skill re-injection<br/>single skill ≤5K, total ≤25K"]

Figure 26-1: Compaction-restoration flow

The key to the restoration strategy is selectivity: not restoring all files, but the most recent 5; not restoring complete file contents, but truncating within 5K tokens; total not exceeding 50K. These numbers reflect carefully considered trade-offs: restoring too much is equivalent to not compacting, restoring too little is equivalent to over-compacting.

The skill restoration design is equally refined. Post-compaction doesn't re-inject the names of already-sent skills (sentSkillNames), because the model still holds SkillTool's Schema — it knows the skill system exists, it just forgot the specific skill contents. This saves approximately 4K tokens.

Anti-pattern: Full compaction or full preservation. Either restoring nothing (model forced to start from scratch) or attempting to preserve everything (compaction effectiveness is zero).


26.4 Principle Four: Inform, Don't Hide

Definition: When content is truncated or compressed, the model must be informed about what happened, allowing it to proactively obtain the complete information.

Claude Code practices this principle at multiple levels:

Tool result truncation notification. When a tool result exceeds 50K characters (DEFAULT_MAX_RESULT_SIZE_CHARS), the complete result is written to disk (restored-src/src/utils/toolResultStorage.ts), and the model receives a preview message including truncation notice and the disk path to the full content. The model thus knows: (1) what it sees is not everything, (2) how to get everything.

Cache micro-compaction notification (see Chapter 11 for details). When cache_edits removes old tool results, notifyCacheDeletion() informs the model that "some old tool results have been cleaned up." This prevents the model from referencing content that no longer exists.

File read pagination. FileReadTool reads 2000 lines by default, supporting pagination through offset/limit parameters. The tool description explicitly explains this behavior — the model knows it only sees the first 2000 lines by default and can specify an offset when it needs later content.

Explicit declaration in compaction summaries. The compaction prompt (see Chapter 9 for details) requires the summary to include "where progress stands" and "what still needs to be done" — ensuring the post-compaction model knows which stage of the task it's at. The <analysis> draft block in the compaction prompt (restored-src/src/services/compact/prompt.ts:31) lets the model first analyze the conversation content, then generate a structured summary — the analysis block is removed during formatting and doesn't occupy final context space.

Anti-pattern: Silent truncation. Truncating tool results or deleting context content without the model's knowledge. The model may make incorrect decisions based on incomplete information, or "fabricate" content it can't quite remember — because it doesn't know its information is incomplete.


26.5 Principle Five: Circuit-Break Runaway Loops

Definition: When an automated process fails consecutively, there must be a mechanism to force a stop, rather than retrying infinitely.

The auto-compaction circuit breaker is the most direct implementation. MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3 (restored-src/src/services/compact/autoCompact.ts:70) — after 3 consecutive failures, stop trying. The source code comment (see Chapter 25, Principle Six for the complete code reference) documents the engineering rationale for this number: BigQuery data showed 1,279 sessions experienced 50+ consecutive compaction failures (up to 3,272), wasting approximately 250K API calls per day.

More broadly, Claude Code implements similar circuit-breaking mechanisms across multiple subsystems:

SubsystemCircuit Break ConditionCircuit Break BehaviorSource Location
Auto-compaction3 consecutive failuresStop compacting until session endautoCompact.ts:70
YOLO classifier3 consecutive/20 total denialsFall back to manual user confirmationdenialTracking.ts:12-15
max_output_tokens recoveryMax 3 retriesStop retrying, accept truncated outputSee Chapter 3 for details
Prompt-too-longDrop oldest turns → drop 20%Degraded handling, not infinite droppingSee Chapter 9 for details

Table 26-2: Claude Code's circuit breakers at a glance

Each circuit breaker follows the same pattern: set a reasonable retry limit, degrade to a safe but functionally limited state when exceeded, rather than crashing or infinite looping.

Anti-pattern: Infinite retry. "Compaction failed? Try again. Failed again? Try with different parameters." This is especially dangerous in AI Agents — each retry consumes API calls (real money), and the failure reason is often systemic (context too large to compress within the summary token budget), so retrying won't change the result.


26.6 Principle Six: Estimate Conservatively

Definition: In token counting and budget allocation, it's better to overestimate consumption than to underestimate — underestimation leads to overflow, overestimation only wastes slightly more space.

Claude Code's token estimation chooses the conservative direction in every scenario (see Chapter 12 for details):

Content TypeEstimation StrategyConservatism LevelReason
Plain text4 bytes/tokenModerateEnglish is actually ~3.5-4.5
JSON content2 bytes/tokenHighly conservativeStructural characters tokenize inefficiently
Images/documentsFixed 2000 tokensHighly conservativeActual formula is width×height/750, but fixed value used when metadata unavailable
Cache tokensFrom API usageExact (when available)Only API-returned counts are authoritative

Table 26-3: Token estimation strategy comparison

Estimating JSON at 2 bytes/token is a particularly meaningful design choice. JSON structural characters ({}, [], "", :, ,) tokenize far less efficiently than natural language — 100 bytes of JSON might consume 40-50 tokens, while 100 bytes of English only needs 25-30 tokens. If you use the generic 4 bytes/token estimate, JSON-dense tool results would be severely underestimated, potentially causing context overflow.

The skill listing budget also reflects this (restored-src/src/tools/SkillTool/prompt.ts:22): CHARS_PER_TOKEN = 4 is used to convert token budgets to character budgets — using the most conservative characters/token ratio to ensure no overspending.

The benefits of conservative estimation far outweigh the costs. The worst case of overestimating token consumption is triggering compaction early — the user waits a few extra seconds. The worst case of underestimating token consumption is a prompt_too_long error — the API call fails, requiring emergency context dropping, potentially losing critical information.

Anti-pattern: The illusion of exact counting. Attempting to precisely calculate token counts on the client side. Only the API server-side tokenizer can provide exact values — any client-side count is an estimate. Since it's an estimate, it should be biased toward the safe direction.


Pattern Distillation

Six Principles Summary Table

PrincipleCore Source Code TraceAnti-pattern
Budget EverythingtoolLimits.ts:13,49 — 50K per item, 200K per messageUnbounded content injection
Context HygienerunAgent.ts:385-404 — read-only agents omit CLAUDE.md and gitStatusFull inheritance
Preserve What Matterscompact.ts:122 — restore most recent 5 filesFull compaction or full preservation
Inform, Don't HidetoolResultStorage.ts — provide disk path when truncatingSilent truncation
Circuit-Break Runaway LoopsautoCompact.ts:70 — stop after 3 consecutive failuresInfinite retry
Estimate ConservativelySkillTool/prompt.ts:22CHARS_PER_TOKEN = 4The illusion of exact counting

Table 26-4: Summary of the Six Context Management Principles

Relationships Between Principles

graph LR
    A["Budget<br/>Everything"] --> B["Context<br/>Hygiene"]
    B --> C["Preserve<br/>What Matters"]
    C --> D["Inform,<br/>Don't Hide"]
    A --> E["Circuit-Break<br/>Runaway Loops"]
    A --> F["Estimate<br/>Conservatively"]
    F --> A

Figure 26-2: Relationship diagram of the six context management principles

Budget Everything is the foundation — defining the token cap for each content source. Context Hygiene determines what content shouldn't enter the current window at all. Preserve What Matters handles post-compaction restoration, Inform, Don't Hide ensures the model knows what's been truncated, Circuit-Break Runaway Loops prevents automated processes from exceeding budgets, and Estimate Conservatively ensures budgets aren't circumvented by underestimation.

Pattern: Tiered Token Budget

  • Problem solved: Multiple content sources competing for limited context space
  • Core approach: Independent budget per source + total budget, with truncation cascade handling for overage
  • Code template: Per-item limit (50K) → aggregate limit (200K/message) → global limit (context window - output reserve - buffer)
  • Precondition: Ability to estimate content's token consumption before injection

Pattern: Context Hygiene

  • Problem solved: Read-only helper agents repeatedly inheriting irrelevant but expensive prefix content
  • Core approach: Omit context irrelevant to the current responsibility at spawn time, and isolate exploration noise within sub-conversations
  • Precondition: Ability to distinguish which context the current agent will actually consume

Pattern: Compaction-Restoration Cycle

  • Problem solved: Compaction loses critical context
  • Core approach: Pre-compaction snapshot → compact → selectively restore most recent/most important content
  • Precondition: Ability to track which content was "most recently used"

Pattern: Circuit Breaker

  • Problem solved: Automated processes infinitely looping under abnormal conditions
  • Core approach: Stop after N consecutive failures, degrade to safe state
  • Precondition: Defined criteria for "failure" and post-degradation behavior

What You Can Do

  1. Audit your Agent's context consumption. Measure how many tokens each content source consumes in real-world scenarios, and identify the biggest consumers
  2. Set size limits for tool results. Ensure file reads, database queries, and API call results have character/line count caps
  3. Slim down read-only helpers. Search-type and planning-type agents should not inherit the complete CLAUDE.md, git status, and recent tool outputs by default
  4. Implement post-compaction restoration. If your Agent uses context compression, design a restoration strategy — so the post-compaction model doesn't need to start from zero
  5. Inform the model when truncating. Tell the model "this is truncated, the full version is here" — this is far better than silently truncating and having the model discover the information gap on its own
  6. Add circuit breakers. Set retry limits for any potentially looping automated process. Degraded operation is always better than an infinite loop