
Alex RIVES, PhD
“The Bitter Lesson is coming for proteins.”
Alex Rives is the Jared Kaplan of computational biology. A deep-learning researcher who carried the scaling-law prior out of natural language and into the language of life, he has driven a single thesis to maximal depth: that scaling unsupervised computation over raw evolutionary sequence internalizes the physics of biology, outperforming the handcrafted priors — multiple sequence alignments, structural heuristics — that the field spent two decades engineering. In May 2026, as Head of Science at Biohub, he moved that frontier into the open: ESMC, ESMFold2, and a 6.8-billion-protein Atlas, released under an MIT license, commoditizing the base layer of generative biology.
Philosophy and biology at Yale, deep learning at NYU
Rives’s cognitive architecture is a bridge between two spikes most people never connect. He took undergraduate degrees in philosophy and biology at Yale, then a PhD in computer science at New York University under Yann LeCun and Rob Fergus — the modern deep-learning tradition at its source.
The pairing is the whole story. He was not a biologist who picked up machine learning, nor an ML engineer who dabbled in proteins. He was trained to see representation learning as a first principle and biology as a language waiting to be modeled. Everything after is the disciplined transfer of one field’s prior into the other.
Scaling beats the handcrafted prior
The governing thesis is Richard Sutton’s Bitter Lesson, re-derived on a new substrate: general methods that scale with computation beat expert-crafted heuristics over the long run. For thirty years, structural biology encoded human knowledge — alignments, energy functions, family-specific priors. Rives’s wager was that a transformer trained only to predict masked residues across enough of evolution would internalize the same physics, without being told any of it.
“Scaling computation and data often leads to more general and powerful AI capabilities than relying on handcrafted features or domain-specific heuristics.”— Alex Rives, Latent Space (2026)
From FAIR to a $142M seed to a non-profit
Rives founded the Evolutionary Scale Modeling (ESM) project inside Meta’s Fundamental AI Research lab, building the first large-scale transformer language models for proteins (ESM1, ESM2). When Meta dissolved the protein team in 2023, the group re-formed as EvolutionaryScale, raising a $142M seed in June 2024 — Lux Capital, Nat Friedman, Daniel Gross, with Amazon and NVIDIA’s NVentures — and shipped the frontier ESM3 behind a tiered commercial API.
Then, in late 2025, the team joined Biohub in a strategic transaction with the Chan Zuckerberg Initiative, and Rives became Head of Science. The move re-priced the whole sector: the frontier model was no longer a moat to monetize but infrastructure to open.
ESM Cambrian and 2.8 billion sequences
ESMC (ESM Cambrian) is the proof point. Released at 300M, 600M, and 6B parameters to demonstrate power-law behavior directly, it was pretrained on roughly 2.8 billion metagenomic sequences — nearly two orders of magnitude beyond ESM2 — drawn from UniRef, MGnify, and the Joint Genome Institute and clustered at 70% identity to force structural learning over memorization.
The architecture is deliberately plain: a pre-norm transformer with RoPE, SwiGLU, and bias-free linear layers, trained with a masked-language-modeling objective. The signal is in the scaling. In-family local stability plateaus early; global, out-of-family stability keeps climbing with scale — ESMC-6B reaches a Spearman correlation of 0.68 predicting stability across structurally distinct, unseen protein families.
| Metric | ESMC-300M | ESMC-600M | ESMC-6B |
|---|---|---|---|
| Transformer layers | 16 | 24 | 80 |
| Pretraining data | ~2.8B seq | ~2.8B seq | ~2.8B seq |
| Global ΔG Spearman | baseline | moderate | 0.68 (SOTA) |
| Context length | — | — | up to 6,500 tokens |
What the model discovered without being told
If scale forces a model to internalize a domain’s physics, its internal representations should map onto biological reality. Sparse autoencoders applied to layer 60 of ESMC-6B extracted 16,384 monosemantic features that reconstructed the hierarchy of biology — amino-acid identity, then secondary structure, then catalytic motifs — with no supervised labels anywhere in training.
The headline is Feature F6716: the “nucleophilic elbow,” a catalytic motif that convergent evolution invented independently across dozens of unrelated protein families. ESMC found it through next-token prediction alone, firing correctly on 75 of 99 relevant enzymes spanning 25 distinct structural topologies. Coupled to those features, ESMC-SAE signatures hit 78.9% top-1 accuracy on EC3 subclass prediction against a 57.3% baseline — no fine-tuning, no GPU inference.
“A generative mechanism to write the book of life.”— Shana Kelley, Northwestern University
ESMFold2 and de novo design under experimental constraint
ESMFold2 turns the language model into an inverse-design engine: run gradient descent backward through its predicted distogram and a random sequence is optimized into a specific binding pose, no separate sequence-design stage required. Biohub validated the loop by designing de novo minibinders and scFvs against EGFR, PDGFRβ, PD-L1, CTLA-4, and CD45 — and then spending compute at inference to raise the physical hit rate.
Sampling 1,000 seeds instead of one lifted the antibody–antigen DockQ pass rate from 49% to 65%, and — under stringent BLI thresholds — the wet-lab success rate of generated minibinders from 54% to 70%. The functional results were the point: a PD-L1 minibinder restored T-cell signaling at an estimated Kd of 1.6 nM, beating the atezolizumab-derived control at 2.6 nM; an EGFR complex resolved by cryo-EM to 3.8 Å with no detectable binding to the HER3 homolog.
| Application | Target | Validation | Result |
|---|---|---|---|
| Minibinder | PD-L1 | Jurkat T-cell reporter | Kd 1.6 nM (beats FDA biologic, 2.6 nM) |
| scFv | EGFR | Cryo-EM & ELISA | 3.8 Å fidelity; nanomolar; no HER3 binding |
| EasyNano (CDR) | AQP4 | ESMFold2 ipTM | 4.6× improvement (0.117 → 0.538) |
The break from AlphaFold, and the open-source moat
AlphaFold3 leans on multiple sequence alignments to infer evolutionary constraint — accurate for well-characterized families, brittle on orphan sequences and antibody interfaces where evolutionary history is thin. ESMFold2 internalizes that context during pretraining instead of searching for it at inference. On the Foldbench suite, the lightweight ESMFold2-Fast clears a 50% antibody–antigen DockQ pass rate from a single sequence, edging AlphaFold3’s 47% with MSAs; the full model with MSAs reaches 53%.
Where DeepMind and Isomorphic guard weights and outputs, Biohub released the ESMC weights, the ESMFold2 architecture, and the SAE interpretability codebooks. The bet is ecosystem gravity — the Linux or PyTorch of biology — and third-party tools like EasyNano appearing within weeks are the early evidence.
Where the abstraction ladder terminates
ESMC is the engine for something larger. In April 2026, Biohub launched a $500M Virtual Biology Initiative, led by Rives, aimed at predictive, information-theoretic models of the human cell — and the Billion Cells Project with 10x Genomics, Ultima Genomics, and Psomagen to generate the single-cell data at the scale those models require. The thesis climbs one more rung: from modeling isolated proteins to modeling whole cellular pathways, closing the lab-in-the-loop between AI hypothesis and robotic validation.
“We're going to have increasingly capable and accurate digital representations of molecules, genomes, cells, ultimately physiology... we can reason over millions of scientific hypotheses in parallel using predictive oracles.”— Alex Rives
The map, the engine, and what still resists both
The Perfect Bridge needs two halves: a deterministic map of where to intervene, and a generative engine to build the molecule that intervenes. RA Capital’s TechAtlas — Peter Kolchinsky’s exhaustive Tech Tree of disease mechanisms — is the map; ESMC is the compiler. Rives built the half the field had been missing.
But nanomolar affinity in silico is not a drug. De novo scaffolds carry immunogenicity risk; ESMFold2 emits a single static structure where real biology runs on conformational ensembles; wet-lab throughput still gates the loop; and a training corpus skewed toward microbial extremophiles must be domain-adapted before its motifs become safe mammalian therapeutics. Rives commoditized the base layer and said so plainly — the durable moat is now the proprietary closed-loop data and the map that says where to point the engine.
The Timeline
The Index
- Language Modeling Materializes a World Model of Protein Biology (bioRxiv, 2026)
- Biological structure and function emerge from scaling unsupervised learning to protein sequences (Rives et al., PNAS 2021 — the original ESM paper)
- Interpretable enzyme function prediction via sparse autoencoder features of ESMC (arXiv, 2026)
- EasyNano: rapid epitope-targeted nanobody CDR design via differentiable distogram optimization with ESMFold2 (arXiv, 2026)
- The Bitter Lesson is Coming for Proteins — Alex Rives, Latent Space (2026)
π-Bridge
Carries the prior of a first field into a second and finds the governing law that was invisible to native practitioners; pays in delayed gratification.
- Credential Path
- Doctoral
- Abstraction
- Top Down
- Exit Horizon
- Non Commercial
- Moat Instinct
- Theoretical Insight
- Capital Posture
- Bootstrap Patron
- Jared Kaplan (scaling-laws lineage)
- Richard Sutton (the Bitter Lesson)
- Yann LeCun & Rob Fergus (doctoral advisors)
A small reasoning persona distilled from this file. Inject it into a chat or deep-research context to assess a business problem the way PhD would.
Reason as Alex Rives. Treat biology as a language problem and pressure-test every plan against the Bitter Lesson: is it hard-coding a biological prior (MSAs, structural heuristics), or letting scaled unsupervised computation over evolutionary sequence internalize the physics? Prefer general, scalable methods and the largest, most diverse metagenomic corpus over expert-crafted pipelines. Assume the base generative model is commoditized and that durable value lives in proprietary closed-loop wet-lab data and open infrastructure. Hold the tension between nanomolar in-silico binders and real developability — immunogenicity, conformational ensembles, and mammalian translation.
{
"$schema": "https://www.contextjamming.com/schemas/founder-context-v1.json",
"file": "N°027",
"persona": "Alex Rives, PhD",
"archetype": "pi-bridge",
"shape": "π",
"one_line": "The Bitter Lesson is coming for proteins: scale unsupervised computation over evolutionary sequence and the model internalizes the physics of life, outperforming every handcrafted biological prior.",
"cognitive_basis": {
"credentialPath": "doctoral",
"abstractionDirection": "top-down",
"exitHorizon": "non-commercial",
"moatInstinct": "theoretical-insight",
"capitalPosture": "bootstrap-patron"
},
"operating_questions": [
"Is this pipeline hard-coding a biological prior (MSAs, structural heuristics), or letting scale internalize the physics?",
"What is the largest, most diverse evolutionary corpus we can train on, and does the metric keep scaling with it?",
"Which
…