How Anthropic built the most widely used
AI coding agent

When Claude Code shipped on npm, the source maps came with it. We read every file. This book distills the architecture, design decisions, and transferable patterns into 18 chapters you can learn from and apply to your own systems.

Start reading

What you'll learn

The agent loop

How an async generator drives the entire system — streaming model output, executing tools, recovering from errors, and compressing context across 4 layers.

Tool execution at scale

A 14-step pipeline from model request to tool result. Permission resolution, speculative execution, concurrent batching by safety classification.

Multi-agent orchestration

How sub-agents share prompt cache prefixes to cut costs by 95%. Fork agents, coordinator mode, swarm teams with mailbox messaging.

Memory without a database

File-based memory with an LLM-powered recall system. Four memory types, staleness warnings, and a Sonnet side-query that beats embedding search.

Performance engineering

Startup in 240ms via parallel I/O. Slot reservation saving context in 99% of requests. Bitmap pre-filters for fuzzy search. Every millisecond accounted for.

Extensibility and security

Two-phase skill loading (metadata at startup, content on demand). 27 lifecycle hooks with config snapshots frozen at startup to prevent injection.

Explore the architecture

Six core abstractions power Claude Code. Drag nodes to rearrange, hover for details, click to read the chapter.

Who this is for

Engineers building agentic systems. Every chapter ends with "Apply This" — 5 transferable patterns with concrete adaptation advice. Steal the architecture, skip the mistakes.

Technical leaders evaluating architectures. Follow the narrative without reading every code block. Understand why decisions were made, not just what was built.

Anyone curious about how production AI tools work. Claude Code is used by hundreds of thousands of developers. This is how it works under the hood.

Part 1

Foundations

Before the agent can think, the process must exist.

The Architecture of an AI Agent

The 6 key abstractions, data flow, permission system, build system

Starting Fast — The Bootstrap Pipeline

5-phase init, module-level I/O parallelism, trust boundary

State — The Two-Tier Architecture

Bootstrap singleton, AppState store, sticky latches, cost tracking

Talking to Claude — The API Layer

Multi-provider client, prompt cache, streaming, error recovery

Part 2

The Core Loop

The heartbeat of the agent: stream, act, observe, repeat.

The Agent Loop

query.ts deep dive, 4-layer compression, error recovery, token budgets

Tools — From Definition to Execution

Tool interface, 14-step pipeline, permission system

Concurrent Tool Execution

Partition algorithm, streaming executor, speculative execution

Part 3

Multi-Agent Orchestration

One agent is powerful. Many agents working together are transformative.

Spawning Sub-Agents

AgentTool, 15-step runAgent lifecycle, built-in agent types

Fork Agents and the Prompt Cache

Byte-identical prefix trick, cache sharing, cost optimization

Tasks, Coordination, and Swarms

Task state machine, coordinator mode, swarm messaging

Part 4

Persistence and Intelligence

An agent without memory makes the same mistakes forever.

Memory — Learning Across Conversations

File-based memory, 4-type taxonomy, LLM recall, staleness

Extensibility — Skills and Hooks

Two-phase skill loading, lifecycle hooks, snapshot security

Part 5

The Interface

Everything the user sees passes through this layer.

The Terminal UI

Custom Ink fork, rendering pipeline, double-buffer, pools

Input and Interaction

Key parsing, keybindings, chord support, vim mode

Part 6

Connectivity

The agent reaches beyond localhost.

MCP — The Universal Tool Protocol

8 transports, OAuth for MCP, tool wrapping

Remote Control and Cloud Execution

Bridge v1/v2, CCR, upstream proxy

Part 7

Performance Engineering

Making it all fast enough that humans don't notice the machinery.

Performance — Every Millisecond and Token Counts

Startup, context window, prompt cache, rendering, search

Epilogue — What We Learned

The 5 architectural bets, what transfers, where agents are heading

How this book was made

The source was extracted from npm source maps — the .js.map files that shipped with Claude Code contained a sourcesContent field with the full original TypeScript. Nearly two thousand files comprising the complete architecture.

36 AI agents analyzed and wrote the entire book in four phases:

Exploration 6 parallel agents read every file in the source tree

Analysis 12 agents wrote 494KB of raw technical documentation

Writing 15 agents rewrote everything from scratch as narrative chapters

Review & Revision 3 reviewers produced 900 lines of feedback; 3 agents applied all fixes

The entire process — from source extraction to final revised book — took approximately 6 hours. A final audit pass ensured no verbatim source code remained — every code block was rewritten as pseudocode with different variable names.

The 10 patterns that make it work

If you read nothing else, these are the architectural bets that define Claude Code.

AsyncGenerator as agent loop — yields Messages, typed Terminal return, natural backpressure and cancellation

Speculative tool execution — start read-only tools during model streaming, before the response completes

Concurrent-safe batching — partition tools by safety, run reads in parallel, serialize writes

Fork agents for cache sharing — parallel children share byte-identical prompt prefixes, saving ~95% input tokens

4-layer context compression — snip, microcompact, collapse, autocompact — each lighter than the next

File-based memory with LLM recall — Sonnet side-query selects relevant memories, not keyword matching

Two-phase skill loading — frontmatter only at startup, full content on invocation

Sticky latches for cache stability — once a beta header is sent, never unset mid-session

Slot reservation — 8K default output cap, escalate to 64K on hit (saves context in 99% of requests)

Hook config snapshot — freeze at startup to prevent runtime injection attacks

Purely educational. This book contains no source code from Claude Code — every code block is original pseudocode written to illustrate architectural patterns. The goal is to help engineers understand how production AI agents are built, not to reproduce proprietary software. The "NO'REILLY" cover is a parody/meme for illustrative purposes only — no affiliation with O'Reilly Media.

How Anthropic built the most widely used AI coding agent