FounderFiles·N°002·Interpretability

Filed 04.25.26

Chris Olah — line-art portrait
Fig. · InterpretabilityPortrait · flicker

Subject·Christopher Olah·researcher · interpreter · cartographer

Chris Olah.

Co-founder & Head of Interpretability · Anthropic

He never finished a degree. He helped invent how the field looks at itself. From the Google Brain Circuits thread to Distillto the sparse-autoencoder wave that cracked polysemanticity open, Olah has spent a decade teaching the field how to see inside neural networks — and building the institutions that keep the field honest about what it finds.

BORN
Toronto, Canada
AT
Anthropic
FILE
N°002
§ 01 · The Beginning

The Self-Taught Path

Chris Olah grew up in Toronto and was, by the conventions of the research career, ineligible. He did not finish university. He did not have a PhD. He had, for several years in the early 2010s, what most academic recruiters would call a gap in his record — a period during which he was reading, programming, and writing on a personal blog about neural networks at a moment when nobody quite knew what neural networks were going to be.

The blog turned out to be the credential. Olah was hired into Google Brain on the strength of his portfolio — visualizations of convolutional neural networks that made the internals of an opaque system look, for the first time, like something a human could read. The argument was implicit but unmistakable: the inside of a neural network is not a black box if you bother to design the tools to look at it.

§ 02 · The Journal

Distill, and the Standards Shift

In 2017, Olah co-founded Distill with Shan Carter and a small group of collaborators. Distillwas a journal in the same way that an instrument is a journal — it published machine learning research as visual, interactive articles instead of the dense PDFs the field had grown up on. Building Blocks of Interpretability, Feature Visualization, the Circuits issue: each one rewrote the expectation of what a research paper could communicate.

When Distill went on indefinite hiatus in 2021, Olah wrote that the journal had done what it was built to do. The standards had shifted. Other venues were publishing interactive work. The medium had moved.

§ 03 · The Circuits Thread

Networks Have Mechanisms

The 2020 essay Zoom In: An Introduction to Circuitsis the load-bearing claim of Olah’s career. The argument is simple to state and consequential to verify: trained neural networks are not inscrutable. They contain interpretable mechanisms — circuits — that compute identifiable features and combine them in ways a researcher can read.

Curve detectors. Edge detectors. Pose-invariant neurons. The 2021 Multimodal Neuronspaper, written with collaborators at OpenAI, found that CLIP’s neurons activate on the abstract concept of, say, “Spider-Man,” whether presented as the comic-book panel or the literal word printed on a sign. The same unit. The same direction. Different surface forms.

The implication: networks form abstractions the way humans do. The disagreement is only about how legibly.

The inside of a neural network is not a black box. It is a city. Somebody has to draw the maps.
The thesis, compressed
§ 04 · Anthropic

The Polysemanticity Problem

In 2021, Olah co-founded Anthropic with Dario and Daniela Amodei and a small group of researchers from OpenAI. He became Head of Interpretability. The early years of the lab were dominated by one stubborn fact that the Circuits research had surfaced: features in real networks are rarely clean. A single neuron will fire on, say, “Christmas” AND “curve detectors” AND “the names of seventeen unrelated cities” — not because the network is confused but because, in a model with finite neurons and effectively infinite concepts to encode, neurons get reused.

The problem had a name: polysemanticity. And it was a wall. If you could not point to a single neuron and say “this one represents X,” the whole interpretability program lived under a question mark.

§ 05 · The Breakthrough

Sparse Autoencoders, At Scale

The paper Towards Monosemanticity (2023), and the follow-up Scaling Monosemanticity (2024), described the workaround. Train a small, overcomplete autoencoder — a sparse autoencoder, or SAE — on the activations of a real language model. The SAE’s job is to recover an enormous dictionary of features, each one corresponding to a single, interpretable concept. Polysemantic neurons in the original network decompose into clean, monosemantic features in the SAE.

By 2024, Anthropic had scaled the technique to Claude 3 Sonnet and pulled tens of millions of features out of a frontier model. Features for countries, for emotions, for inner conflict, for code patterns, for the Golden Gate Bridge. Each one a direction. Each one editable.

Olah’s decade-long bet — that interpretability was a science with handles, not a hope — had a result you could clamp on the model’s activations and watch its behavior shift. The polysemanticity wall was, at minimum, a doorway.

Timeline
  • ~2010Self-taught in Toronto. Leaves university; writes blog posts on neural networks instead.
  • 2014Joins Google Brain. Begins the Feature Visualization line of work.
  • 2017Co-founds Distill — the visual, interactive ML journal.
  • 2018Joins OpenAI. Continues the Circuits agenda.
  • 2021Co-founds Anthropic with Dario Amodei, Daniela Amodei, and others. Becomes Head of Interpretability.
  • 2023Anthropic publishes Towards Monosemanticity. Sparse autoencoders crack polysemanticity open.
  • 2024Scaling Monosemanticity: SAEs at Claude scale. Features become a research substrate.
FounderFiles N°002 · Christopher Olah
Filed by Bret Kerr · ACRA Insight LLC · Franklin, MA
contextjamming.com · @bretkerr
← back to GemClaw

§ · Invoice No. 001 · The Build Ledger

The Ledger.

Filed · contextjamming.com

What a conservative mid-market digital agency would have quoted for the same scope, itemized against what this site actually cost. Agency numbers are the floor — not the premium brand-studio tier.

TIME

12 weeks

2 days

~42× faster

COST

~$150,000

~$300

~500× cheaper

TEAM

5-person agency

1 human + 3 models

Same deliverable

§ Itemized — what a mid-market agency SOW would have billed

Discovery · brand positioning · workshops40–80 hr$10,000
Design system · Figma tokens · 3 rounds60–120 hr$18,000
Wavesurfer audio carousel · single-track context60–100 hr$16,000
Dual lightbox systems · focus trap · keyboard30–50 hr$8,000
LLM product flows · streaming · state machine80–160 hr$26,000
Stripe · checkout · webhooks · env hardening40–80 hr$10,000
Editorial routes · 6 sub-pages · templates60–100 hr$14,000
Accessibility pass · aria · reduced-motion40–80 hr$10,000
QA · cross-browser · mobile matrix60–100 hr$14,000
Cross-publication rebrand · masthead + IA · 2026-04-2820–40 hr$6,000
Subtotal~700 hr$126,000
Project management · 18% overhead$24,000
Agency total — conservative floor~700 hr~$150,000
Actually spent · Claude + Gemini stack~20 hr~$300

Agency figure assumes ~700 billable hours at $200/hr blended, plus ~18% PM overhead — the conservative floor of a mid-market SOW. Premium brand studios would have quoted 2–3× that. Stack: Claude Code 4.7 Max, Claude Opus 4.6, Gemini 3.1 Pro, Vercel Pro.

§   Colophon

How this site is made.

Vol. 26 · build log

Every page on contextjamming.com is the output of a real-time, three-body Mixture-of-Experts loop. One model orchestrates. Two consult. The human holds the thesis. No single model commits alone.

Orchestrator

Claude Code 4.7

1M context · Max tier

  • Primary author
  • Terminal-native, direct push to Vercel
  • Audit trail to GitHub on every commit
  • Adaptive thinking · effort: extra-high

Auditor

Claude Opus 4.6

1M context

  • Editorial critic
  • Code review before merge
  • Backup-of-record
  • Co-signs every commit

Adversary

Gemini 3.1 Pro

Cross-model MoE

  • Factual adjudication
  • Structural dissent
  • Deep Research → semantic triples
  • Caught the Donelan incident

Stack

Next.js
16.2 · App Router
React
19.2
TypeScript
5
Tailwind
v4 · @theme inline
framer-motion
transitions
wavesurfer.js
audio waveforms
marked
MD → HTML at build
fast-xml-parser
RSS + Atom

Typeset in

Fraunces
variable · opsz + SOFT
Playfair Display
debate display
IBM Plex Mono
editorial metadata
Geist Mono
utility mono
Caveat
grease-pencil marginalia
All via
next/font/google
Palette
single @theme block
No dupe tokens
ever

Infrastructure

Deploy
Vercel Edge Network
ISR
30-min revalidate · wire + notebook
Repo
github.com/BretKerrAI/founderfile
Branch
hero-redesign-library
Analytics
Google Tag Manager
Apex
contextjamming.com
Runtime
Node 24
Build tool
Turbopack
       human intent
            │
            ▼
   ┌────────────────────┐         ┌─────────────────┐
   │  Claude Code 4.7   │  ◄────► │  Claude Opus 4.6 │      ← auditor loop
   │    (orchestrator)  │         │     (auditor)   │
   └─────────┬──────────┘         └─────────────────┘
             │  ◄───────────┐
             ▼              │
       ┌──────────┐    ┌────┴───────┐
       │  Vercel  │    │ Gemini 3.1 │          ← adversarial loop
       │  (edge)  │    │    Pro     │
       └─────┬────┘    └────────────┘
             │
             ▼
       contextjamming.com
             │
             ▼
       ┌──────────────┐
       │   Git push   │         ← audit trail
       └──────────────┘
Assembled on Mac in Terminal · Filed from Franklin, MAContext Jamming · ACRA Insight LLC · MIT License · FounderFile.ai · RelationalIntelligence.xyz · Commission a Dispatch →