CONTEXT JAMMING

Field notes from inside the context window.

Share
Alex Rives — FounderFiles N°027
FounderFiles · N°027 · Machine Learning · Generative Biology · Open Infrastructure 2026

Alex RIVES, PhD

“The Bitter Lesson is coming for proteins.”

Trained: Yale (Philosophy & Biology) · NYU (PhD, CS)
At: Meta FAIR · EvolutionaryScale · Biohub
File: N°027

Alex Rives is the Jared Kaplan of computational biology. A deep-learning researcher who carried the scaling-law prior out of natural language and into the language of life, he has driven a single thesis to maximal depth: that scaling unsupervised computation over raw evolutionary sequence internalizes the physics of biology, outperforming the handcrafted priors — multiple sequence alignments, structural heuristics — that the field spent two decades engineering. In May 2026, as Head of Science at Biohub, he moved that frontier into the open: ESMC, ESMFold2, and a 6.8-billion-protein Atlas, released under an MIT license, commoditizing the base layer of generative biology.

§ 01 · The Bridge

Philosophy and biology at Yale, deep learning at NYU

Rives’s cognitive architecture is a bridge between two spikes most people never connect. He took undergraduate degrees in philosophy and biology at Yale, then a PhD in computer science at New York University under Yann LeCun and Rob Fergus — the modern deep-learning tradition at its source.

The pairing is the whole story. He was not a biologist who picked up machine learning, nor an ML engineer who dabbled in proteins. He was trained to see representation learning as a first principle and biology as a language waiting to be modeled. Everything after is the disciplined transfer of one field’s prior into the other.

§ 02 · The Bitter Lesson Comes for Proteins

Scaling beats the handcrafted prior

The governing thesis is Richard Sutton’s Bitter Lesson, re-derived on a new substrate: general methods that scale with computation beat expert-crafted heuristics over the long run. For thirty years, structural biology encoded human knowledge — alignments, energy functions, family-specific priors. Rives’s wager was that a transformer trained only to predict masked residues across enough of evolution would internalize the same physics, without being told any of it.

Scaling computation and data often leads to more general and powerful AI capabilities than relying on handcrafted features or domain-specific heuristics.Alex Rives, Latent Space (2026)
§ 03 · ESM at Meta, and the Frontier That Followed

From FAIR to a $142M seed to a non-profit

Rives founded the Evolutionary Scale Modeling (ESM) project inside Meta’s Fundamental AI Research lab, building the first large-scale transformer language models for proteins (ESM1, ESM2). When Meta dissolved the protein team in 2023, the group re-formed as EvolutionaryScale, raising a $142M seed in June 2024 — Lux Capital, Nat Friedman, Daniel Gross, with Amazon and NVIDIA’s NVentures — and shipped the frontier ESM3 behind a tiered commercial API.

Then, in late 2025, the team joined Biohub in a strategic transaction with the Chan Zuckerberg Initiative, and Rives became Head of Science. The move re-priced the whole sector: the frontier model was no longer a moat to monetize but infrastructure to open.

§ 04 · Scaling Laws for the Language of Life

ESM Cambrian and 2.8 billion sequences

ESMC (ESM Cambrian) is the proof point. Released at 300M, 600M, and 6B parameters to demonstrate power-law behavior directly, it was pretrained on roughly 2.8 billion metagenomic sequences — nearly two orders of magnitude beyond ESM2 — drawn from UniRef, MGnify, and the Joint Genome Institute and clustered at 70% identity to force structural learning over memorization.

The architecture is deliberately plain: a pre-norm transformer with RoPE, SwiGLU, and bias-free linear layers, trained with a masked-language-modeling objective. The signal is in the scaling. In-family local stability plateaus early; global, out-of-family stability keeps climbing with scale — ESMC-6B reaches a Spearman correlation of 0.68 predicting stability across structurally distinct, unseen protein families.

MetricESMC-300MESMC-600MESMC-6B
Transformer layers162480
Pretraining data~2.8B seq~2.8B seq~2.8B seq
Global ΔG Spearmanbaselinemoderate0.68 (SOTA)
Context lengthup to 6,500 tokens
§ 05 · The Nucleophilic Elbow

What the model discovered without being told

If scale forces a model to internalize a domain’s physics, its internal representations should map onto biological reality. Sparse autoencoders applied to layer 60 of ESMC-6B extracted 16,384 monosemantic features that reconstructed the hierarchy of biology — amino-acid identity, then secondary structure, then catalytic motifs — with no supervised labels anywhere in training.

The headline is Feature F6716: the “nucleophilic elbow,” a catalytic motif that convergent evolution invented independently across dozens of unrelated protein families. ESMC found it through next-token prediction alone, firing correctly on 75 of 99 relevant enzymes spanning 25 distinct structural topologies. Coupled to those features, ESMC-SAE signatures hit 78.9% top-1 accuracy on EC3 subclass prediction against a 57.3% baseline — no fine-tuning, no GPU inference.

A generative mechanism to write the book of life.Shana Kelley, Northwestern University
§ 06 · Writing, Not Just Reading

ESMFold2 and de novo design under experimental constraint

ESMFold2 turns the language model into an inverse-design engine: run gradient descent backward through its predicted distogram and a random sequence is optimized into a specific binding pose, no separate sequence-design stage required. Biohub validated the loop by designing de novo minibinders and scFvs against EGFR, PDGFRβ, PD-L1, CTLA-4, and CD45 — and then spending compute at inference to raise the physical hit rate.

Sampling 1,000 seeds instead of one lifted the antibody–antigen DockQ pass rate from 49% to 65%, and — under stringent BLI thresholds — the wet-lab success rate of generated minibinders from 54% to 70%. The functional results were the point: a PD-L1 minibinder restored T-cell signaling at an estimated Kd of 1.6 nM, beating the atezolizumab-derived control at 2.6 nM; an EGFR complex resolved by cryo-EM to 3.8 Å with no detectable binding to the HER3 homolog.

ApplicationTargetValidationResult
MinibinderPD-L1Jurkat T-cell reporterKd 1.6 nM (beats FDA biologic, 2.6 nM)
scFvEGFRCryo-EM & ELISA3.8 Å fidelity; nanomolar; no HER3 binding
EasyNano (CDR)AQP4ESMFold2 ipTM4.6× improvement (0.117 → 0.538)
§ 07 · MSA-Free, and Open

The break from AlphaFold, and the open-source moat

AlphaFold3 leans on multiple sequence alignments to infer evolutionary constraint — accurate for well-characterized families, brittle on orphan sequences and antibody interfaces where evolutionary history is thin. ESMFold2 internalizes that context during pretraining instead of searching for it at inference. On the Foldbench suite, the lightweight ESMFold2-Fast clears a 50% antibody–antigen DockQ pass rate from a single sequence, edging AlphaFold3’s 47% with MSAs; the full model with MSAs reaches 53%.

Where DeepMind and Isomorphic guard weights and outputs, Biohub released the ESMC weights, the ESMFold2 architecture, and the SAE interpretability codebooks. The bet is ecosystem gravity — the Linux or PyTorch of biology — and third-party tools like EasyNano appearing within weeks are the early evidence.

§ 08 · The Virtual Cell

Where the abstraction ladder terminates

ESMC is the engine for something larger. In April 2026, Biohub launched a $500M Virtual Biology Initiative, led by Rives, aimed at predictive, information-theoretic models of the human cell — and the Billion Cells Project with 10x Genomics, Ultima Genomics, and Psomagen to generate the single-cell data at the scale those models require. The thesis climbs one more rung: from modeling isolated proteins to modeling whole cellular pathways, closing the lab-in-the-loop between AI hypothesis and robotic validation.

We're going to have increasingly capable and accurate digital representations of molecules, genomes, cells, ultimately physiology... we can reason over millions of scientific hypotheses in parallel using predictive oracles.Alex Rives
§ 09 · The Membrane Problem

The map, the engine, and what still resists both

The Perfect Bridge needs two halves: a deterministic map of where to intervene, and a generative engine to build the molecule that intervenes. RA Capital’s TechAtlas — Peter Kolchinsky’s exhaustive Tech Tree of disease mechanisms — is the map; ESMC is the compiler. Rives built the half the field had been missing.

But nanomolar affinity in silico is not a drug. De novo scaffolds carry immunogenicity risk; ESMFold2 emits a single static structure where real biology runs on conformational ensembles; wet-lab throughput still gates the loop; and a training corpus skewed toward microbial extremophiles must be domain-adapted before its motifs become safe mammalian therapeutics. Rives commoditized the base layer and said so plainly — the durable moat is now the proprietary closed-loop data and the map that says where to point the engine.

The Timeline

B.A.
Yale — Philosophy & Biology
Two undergraduate spikes he would later bridge.
Ph.D.
NYU Computer Science — LeCun & Fergus
Trained in the modern deep-learning tradition at its source.
FAIR
Founds the ESM project at Meta
First large-scale transformer protein language models (ESM1, ESM2).
2023
Meta protein team dissolved
The group re-forms outside Meta to keep scaling the thesis.
2024
Co-founds EvolutionaryScale · $142M seed
Lux Capital, Nat Friedman, Daniel Gross, Amazon, NVentures; ships ESM3.
2025
EvolutionaryScale joins Biohub (CZI)
Rives becomes Head of Science; the frontier goes non-profit.
Apr 2026
$500M Virtual Biology Initiative
Predictive models of the cell; the Billion Cells Project launches.
May 2026
ESMC, ESMFold2 & ESM Atlas released (MIT)
A 6.8-billion-protein Atlas; the base layer of generative biology, opened.

The Index

0.68
Global ΔG Spearman
ESMC-6B, out-of-family stability (SOTA)
2.8B
Metagenomic Sequences
ESMC pretraining corpus (UniRef · MGnify · JGI)
6.8B
Proteins Catalogued
ESM Atlas — largest predicted protein database
1.6 nM
PD-L1 Minibinder Kd
De novo design beats atezolizumab (2.6 nM)
50%
Single-Sequence Antibody DockQ
ESMFold2-Fast > AlphaFold3 (47%, with MSAs)
$142M
EvolutionaryScale Seed
Lux · Nat Friedman · Daniel Gross (June 2024)
Reading List
  • Language Modeling Materializes a World Model of Protein Biology (bioRxiv, 2026)
  • Biological structure and function emerge from scaling unsupervised learning to protein sequences (Rives et al., PNAS 2021 — the original ESM paper)
  • Interpretable enzyme function prediction via sparse autoencoder features of ESMC (arXiv, 2026)
  • EasyNano: rapid epitope-targeted nanobody CDR design via differentiable distogram optimization with ESMFold2 (arXiv, 2026)
  • The Bitter Lesson is Coming for Proteins — Alex Rives, Latent Space (2026)
Dossier
Education
Yale (B.A., Philosophy & Biology); New York University (Ph.D., Computer Science, advised by Yann LeCun & Rob Fergus)
Affiliations
Biohub / CZI (Head of Science); EvolutionaryScale (Co-founder); Meta FAIR (Founder, ESM project)
Key Collaborators
Yann LeCun, Rob Fergus, the ESM / EvolutionaryScale team, the Chan Zuckerberg Initiative
Honors
Founder of ESM, the foundational protein-language-model lineage; ESMC, ESMFold2 & ESM Atlas released as open MIT infrastructure (2026)
Career Shape
π-shaped — two deep spikes bridged by a general layer

π-Bridge

Carries the prior of a first field into a second and finds the governing law that was invisible to native practitioners; pays in delayed gratification.

Credential Path
Doctoral
Abstraction
Top Down
Exit Horizon
Non Commercial
Moat Instinct
Theoretical Insight
Capital Posture
Bootstrap Patron
Role-Model Reference Class
  • Jared Kaplan (scaling-laws lineage)
  • Richard Sutton (the Bitter Lesson)
  • Yann LeCun & Rob Fergus (doctoral advisors)
Founder Context · JSON

A small reasoning persona distilled from this file. Inject it into a chat or deep-research context to assess a business problem the way PhD would.

Reason as Alex Rives. Treat biology as a language problem and pressure-test every plan against the Bitter Lesson: is it hard-coding a biological prior (MSAs, structural heuristics), or letting scaled unsupervised computation over evolutionary sequence internalize the physics? Prefer general, scalable methods and the largest, most diverse metagenomic corpus over expert-crafted pipelines. Assume the base generative model is commoditized and that durable value lives in proprietary closed-loop wet-lab data and open infrastructure. Hold the tension between nanomolar in-silico binders and real developability — immunogenicity, conformational ensembles, and mammalian translation.

{
  "$schema": "https://www.contextjamming.com/schemas/founder-context-v1.json",
  "file": "N°027",
  "persona": "Alex Rives, PhD",
  "archetype": "pi-bridge",
  "shape": "π",
  "one_line": "The Bitter Lesson is coming for proteins: scale unsupervised computation over evolutionary sequence and the model internalizes the physics of life, outperforming every handcrafted biological prior.",
  "cognitive_basis": {
    "credentialPath": "doctoral",
    "abstractionDirection": "top-down",
    "exitHorizon": "non-commercial",
    "moatInstinct": "theoretical-insight",
    "capitalPosture": "bootstrap-patron"
  },
  "operating_questions": [
    "Is this pipeline hard-coding a biological prior (MSAs, structural heuristics), or letting scale internalize the physics?",
    "What is the largest, most diverse evolutionary corpus we can train on, and does the metric keep scaling with it?",
    "Which 
  …
Filed by Bret Kerr · ACRA Insight LLC · Franklin, MA · Context Jamming Editorial System

§ · Invoice No. 001 · The Build Ledger

The Ledger.

Filed · contextjamming.com

What a conservative mid-market digital agency would have quoted for the same scope, itemized against what this site actually cost. Agency numbers are the floor — not the premium brand-studio tier.

TIME

12 weeks

2 days

~42× faster

COST

~$150,000

~$300

~500× cheaper

TEAM

5-person agency

1 human + 3 models

Same deliverable

§ Itemized — what a mid-market agency SOW would have billed

Discovery · brand positioning · workshops40–80 hr$10,000
Design system · Figma tokens · 3 rounds60–120 hr$18,000
Wavesurfer audio carousel · single-track context60–100 hr$16,000
Dual lightbox systems · focus trap · keyboard30–50 hr$8,000
LLM product flows · streaming · state machine80–160 hr$26,000
Stripe · checkout · webhooks · env hardening40–80 hr$10,000
Editorial routes · 6 sub-pages · templates60–100 hr$14,000
Accessibility pass · aria · reduced-motion40–80 hr$10,000
QA · cross-browser · mobile matrix60–100 hr$14,000
Cross-publication rebrand · masthead + IA · 2026-04-2820–40 hr$6,000
Subtotal~700 hr$126,000
Project management · 18% overhead$24,000
Agency total — conservative floor~700 hr~$150,000
Actually spent · Claude + Gemini stack~20 hr~$300

Agency figure assumes ~700 billable hours at $200/hr blended, plus ~18% PM overhead — the conservative floor of a mid-market SOW. Premium brand studios would have quoted 2–3× that. Stack: Antigravity (orchestrator), Claude Opus 4.8 (auditor), Codex (adversary), Cloudflare Workers / OpenNext.

§   Colophon

How this site is made.

Vol. 26 · build log

Every page on contextjamming.com is the output of a real-time, three-body Mixture-of-Experts loop. One model orchestrates. Two consult. The human holds the thesis. No single model commits alone.

View Redesign Assessment →

Orchestrator

Antigravity

Google DeepMind

  • Primary author
  • Terminal-native, direct push to Cloudflare
  • Audit trail to GitHub on every commit
  • Adaptive thinking · effort: extra-high

Auditor

Claude Opus 4.8

1M context

  • Editorial critic
  • Code review before merge
  • Backup-of-record
  • Co-signs every commit

Adversary

Codex

Cross-model MoE

  • Factual adjudication
  • Structural dissent
  • Deep Research → semantic triples
  • Caught the Donelan incident

Stack

Next.js
16.2 · App Router
React
19.2
TypeScript
5
Tailwind
v4 · @theme inline
@opennextjs/cloudflare
adapter
wrangler
Pages deploy
framer-motion
transitions
wavesurfer.js
audio waveforms

Typeset in

Fraunces
variable · opsz + SOFT
Playfair Display
debate display
IBM Plex Mono
editorial metadata
Geist Mono
utility mono
Caveat
grease-pencil marginalia
All via
next/font/google
Palette
single @theme block
No dupe tokens
ever

Infrastructure

Deploy
Cloudflare Workers / OpenNext
ISR
30-min revalidate · Cloudflare-served
Repo
github.com/BretKerrAI/founderfile
Branch
main
Analytics
Google Tag Manager
Apex
contextjamming.com
Runtime
Node 24
Build tool
Turbopack
       human intent
            │
            ▼
   ┌────────────────────┐         ┌─────────────────┐
   │    Antigravity     │  ◄────► │ Claude Opus 4.8 │      ← auditor loop
   │    (orchestrator)  │         │     (auditor)   │
   └─────────┬──────────┘         └─────────────────┘
             │  ◄───────────┐
             ▼              │
       ┌──────────┐    ┌────┴───────┐
       │Cloudflare│    │   Codex    │          ← adversarial loop
       │ Workers  │    │            │
       └─────┬────┘    └────────────┘
             │
             ▼
       contextjamming.com
             │
             ▼
       ┌──────────────┐
       │   Git push   │         ← audit trail
       └──────────────┘
Assembled on Mac in Terminal · Filed from Franklin, MAContext Jamming · ACRA Insight LLC · MIT License · FounderFile.ai · RelationalIntelligence.xyz · Commission a Dispatch →