FounderFiles · N°027 · Machine Learning · Generative Biology · Open Infrastructure 2026

Alex RIVES, PhD

“The Bitter Lesson is coming for proteins.”

Trained: Yale (Philosophy & Biology) · NYU (PhD, CS)

At: Meta FAIR · EvolutionaryScale · Biohub

File: N°027

Alex Rives is the Jared Kaplan of computational biology. A deep-learning researcher who carried the scaling-law prior out of natural language and into the language of life, he has driven a single thesis to maximal depth: that scaling unsupervised computation over raw evolutionary sequence internalizes the physics of biology, outperforming the handcrafted priors — multiple sequence alignments, structural heuristics — that the field spent two decades engineering. In May 2026, as Head of Science at Biohub, he moved that frontier into the open: ESMC, ESMFold2, and a 6.8-billion-protein Atlas, released under an MIT license, commoditizing the base layer of generative biology.

§ 01 · The Bridge

Philosophy and biology at Yale, deep learning at NYU

Rives’s cognitive architecture is a bridge between two spikes most people never connect. He took undergraduate degrees in philosophy and biology at Yale, then a PhD in computer science at New York University under Yann LeCun and Rob Fergus — the modern deep-learning tradition at its source.

The pairing is the whole story. He was not a biologist who picked up machine learning, nor an ML engineer who dabbled in proteins. He was trained to see representation learning as a first principle and biology as a language waiting to be modeled. Everything after is the disciplined transfer of one field’s prior into the other.

§ 02 · The Bitter Lesson Comes for Proteins

Scaling beats the handcrafted prior

The governing thesis is Richard Sutton’s Bitter Lesson, re-derived on a new substrate: general methods that scale with computation beat expert-crafted heuristics over the long run. For thirty years, structural biology encoded human knowledge — alignments, energy functions, family-specific priors. Rives’s wager was that a transformer trained only to predict masked residues across enough of evolution would internalize the same physics, without being told any of it.

“Scaling computation and data often leads to more general and powerful AI capabilities than relying on handcrafted features or domain-specific heuristics.”— Alex Rives, Latent Space (2026)

§ 03 · ESM at Meta, and the Frontier That Followed

From FAIR to a $142M seed to a non-profit

Rives founded the Evolutionary Scale Modeling (ESM) project inside Meta’s Fundamental AI Research lab, building the first large-scale transformer language models for proteins (ESM1, ESM2). When Meta dissolved the protein team in 2023, the group re-formed as EvolutionaryScale, raising a $142M seed in June 2024 — Lux Capital, Nat Friedman, Daniel Gross, with Amazon and NVIDIA’s NVentures — and shipped the frontier ESM3 behind a tiered commercial API.

Then, in late 2025, the team joined Biohub in a strategic transaction with the Chan Zuckerberg Initiative, and Rives became Head of Science. The move re-priced the whole sector: the frontier model was no longer a moat to monetize but infrastructure to open.

§ 04 · Scaling Laws for the Language of Life

ESM Cambrian and 2.8 billion sequences

ESMC (ESM Cambrian) is the proof point. Released at 300M, 600M, and 6B parameters to demonstrate power-law behavior directly, it was pretrained on roughly 2.8 billion metagenomic sequences — nearly two orders of magnitude beyond ESM2 — drawn from UniRef, MGnify, and the Joint Genome Institute and clustered at 70% identity to force structural learning over memorization.

The architecture is deliberately plain: a pre-norm transformer with RoPE, SwiGLU, and bias-free linear layers, trained with a masked-language-modeling objective. The signal is in the scaling. In-family local stability plateaus early; global, out-of-family stability keeps climbing with scale — ESMC-6B reaches a Spearman correlation of 0.68 predicting stability across structurally distinct, unseen protein families.

Metric	ESMC-300M	ESMC-600M	ESMC-6B
Transformer layers	16	24	80
Pretraining data	~2.8B seq	~2.8B seq	~2.8B seq
Global ΔG Spearman	baseline	moderate	0.68 (SOTA)
Context length	—	—	up to 6,500 tokens

§ 05 · The Nucleophilic Elbow

What the model discovered without being told

If scale forces a model to internalize a domain’s physics, its internal representations should map onto biological reality. Sparse autoencoders applied to layer 60 of ESMC-6B extracted 16,384 monosemantic features that reconstructed the hierarchy of biology — amino-acid identity, then secondary structure, then catalytic motifs — with no supervised labels anywhere in training.

The headline is Feature F6716: the “nucleophilic elbow,” a catalytic motif that convergent evolution invented independently across dozens of unrelated protein families. ESMC found it through next-token prediction alone, firing correctly on 75 of 99 relevant enzymes spanning 25 distinct structural topologies. Coupled to those features, ESMC-SAE signatures hit 78.9% top-1 accuracy on EC3 subclass prediction against a 57.3% baseline — no fine-tuning, no GPU inference.

“A generative mechanism to write the book of life.”— Shana Kelley, Northwestern University

§ 06 · Writing, Not Just Reading

ESMFold2 and de novo design under experimental constraint

ESMFold2 turns the language model into an inverse-design engine: run gradient descent backward through its predicted distogram and a random sequence is optimized into a specific binding pose, no separate sequence-design stage required. Biohub validated the loop by designing de novo minibinders and scFvs against EGFR, PDGFRβ, PD-L1, CTLA-4, and CD45 — and then spending compute at inference to raise the physical hit rate.

Sampling 1,000 seeds instead of one lifted the antibody–antigen DockQ pass rate from 49% to 65%, and — under stringent BLI thresholds — the wet-lab success rate of generated minibinders from 54% to 70%. The functional results were the point: a PD-L1 minibinder restored T-cell signaling at an estimated K_d of 1.6 nM, beating the atezolizumab-derived control at 2.6 nM; an EGFR complex resolved by cryo-EM to 3.8 Å with no detectable binding to the HER3 homolog.

Application	Target	Validation	Result
Minibinder	PD-L1	Jurkat T-cell reporter	K_d 1.6 nM (beats FDA biologic, 2.6 nM)
scFv	EGFR	Cryo-EM & ELISA	3.8 Å fidelity; nanomolar; no HER3 binding
EasyNano (CDR)	AQP4	ESMFold2 ipTM	4.6× improvement (0.117 → 0.538)

§ 07 · MSA-Free, and Open

The break from AlphaFold, and the open-source moat

AlphaFold3 leans on multiple sequence alignments to infer evolutionary constraint — accurate for well-characterized families, brittle on orphan sequences and antibody interfaces where evolutionary history is thin. ESMFold2 internalizes that context during pretraining instead of searching for it at inference. On the Foldbench suite, the lightweight ESMFold2-Fast clears a 50% antibody–antigen DockQ pass rate from a single sequence, edging AlphaFold3’s 47% with MSAs; the full model with MSAs reaches 53%.

Where DeepMind and Isomorphic guard weights and outputs, Biohub released the ESMC weights, the ESMFold2 architecture, and the SAE interpretability codebooks. The bet is ecosystem gravity — the Linux or PyTorch of biology — and third-party tools like EasyNano appearing within weeks are the early evidence.

§ 08 · The Virtual Cell

Where the abstraction ladder terminates

ESMC is the engine for something larger. In April 2026, Biohub launched a $500M Virtual Biology Initiative, led by Rives, aimed at predictive, information-theoretic models of the human cell — and the Billion Cells Project with 10x Genomics, Ultima Genomics, and Psomagen to generate the single-cell data at the scale those models require. The thesis climbs one more rung: from modeling isolated proteins to modeling whole cellular pathways, closing the lab-in-the-loop between AI hypothesis and robotic validation.

“We're going to have increasingly capable and accurate digital representations of molecules, genomes, cells, ultimately physiology... we can reason over millions of scientific hypotheses in parallel using predictive oracles.”— Alex Rives

§ 09 · The Membrane Problem

The map, the engine, and what still resists both

The Perfect Bridge needs two halves: a deterministic map of where to intervene, and a generative engine to build the molecule that intervenes. RA Capital’s TechAtlas — Peter Kolchinsky’s exhaustive Tech Tree of disease mechanisms — is the map; ESMC is the compiler. Rives built the half the field had been missing.

But nanomolar affinity in silico is not a drug. De novo scaffolds carry immunogenicity risk; ESMFold2 emits a single static structure where real biology runs on conformational ensembles; wet-lab throughput still gates the loop; and a training corpus skewed toward microbial extremophiles must be domain-adapted before its motifs become safe mammalian therapeutics. Rives commoditized the base layer and said so plainly — the durable moat is now the proprietary closed-loop data and the map that says where to point the engine.

The Timeline

B.A.

Yale — Philosophy & Biology

Two undergraduate spikes he would later bridge.

Ph.D.

NYU Computer Science — LeCun & Fergus

Trained in the modern deep-learning tradition at its source.

FAIR

Founds the ESM project at Meta

First large-scale transformer protein language models (ESM1, ESM2).

2023

Meta protein team dissolved

The group re-forms outside Meta to keep scaling the thesis.

2024

Co-founds EvolutionaryScale · $142M seed

Lux Capital, Nat Friedman, Daniel Gross, Amazon, NVentures; ships ESM3.

2025

EvolutionaryScale joins Biohub (CZI)

Rives becomes Head of Science; the frontier goes non-profit.

Apr 2026

$500M Virtual Biology Initiative

Predictive models of the cell; the Billion Cells Project launches.

May 2026

ESMC, ESMFold2 & ESM Atlas released (MIT)

A 6.8-billion-protein Atlas; the base layer of generative biology, opened.

The Index

0.68

Global ΔG Spearman

ESMC-6B, out-of-family stability (SOTA)

2.8B

Metagenomic Sequences

ESMC pretraining corpus (UniRef · MGnify · JGI)

6.8B

Proteins Catalogued

ESM Atlas — largest predicted protein database

1.6 nM

PD-L1 Minibinder Kd

De novo design beats atezolizumab (2.6 nM)

50%

Single-Sequence Antibody DockQ

ESMFold2-Fast > AlphaFold3 (47%, with MSAs)

$142M

EvolutionaryScale Seed

Lux · Nat Friedman · Daniel Gross (June 2024)

Reading List

Language Modeling Materializes a World Model of Protein Biology (bioRxiv, 2026)
Biological structure and function emerge from scaling unsupervised learning to protein sequences (Rives et al., PNAS 2021 — the original ESM paper)
Interpretable enzyme function prediction via sparse autoencoder features of ESMC (arXiv, 2026)
EasyNano: rapid epitope-targeted nanobody CDR design via differentiable distogram optimization with ESMFold2 (arXiv, 2026)
The Bitter Lesson is Coming for Proteins — Alex Rives, Latent Space (2026)

Dossier

Education

Yale (B.A., Philosophy & Biology); New York University (Ph.D., Computer Science, advised by Yann LeCun & Rob Fergus)

Affiliations

Biohub / CZI (Head of Science); EvolutionaryScale (Co-founder); Meta FAIR (Founder, ESM project)

Key Collaborators

Yann LeCun, Rob Fergus, the ESM / EvolutionaryScale team, the Chan Zuckerberg Initiative

Honors

Founder of ESM, the foundational protein-language-model lineage; ESMC, ESMFold2 & ESM Atlas released as open MIT infrastructure (2026)

Career Shape

π-shaped — two deep spikes bridged by a general layer

π-Bridge

Carries the prior of a first field into a second and finds the governing law that was invisible to native practitioners; pays in delayed gratification.

Credential Path: Doctoral
Abstraction: Top Down
Exit Horizon: Non Commercial
Moat Instinct: Theoretical Insight
Capital Posture: Bootstrap Patron

Role-Model Reference Class

Jared Kaplan (scaling-laws lineage)
Richard Sutton (the Bitter Lesson)
Yann LeCun & Rob Fergus (doctoral advisors)

Founder Context · JSON

A small reasoning persona distilled from this file. Inject it into a chat or deep-research context to assess a business problem the way PhD would.

Reason as Alex Rives. Treat biology as a language problem and pressure-test every plan against the Bitter Lesson: is it hard-coding a biological prior (MSAs, structural heuristics), or letting scaled unsupervised computation over evolutionary sequence internalize the physics? Prefer general, scalable methods and the largest, most diverse metagenomic corpus over expert-crafted pipelines. Assume the base generative model is commoditized and that durable value lives in proprietary closed-loop wet-lab data and open infrastructure. Hold the tension between nanomolar in-silico binders and real developability — immunogenicity, conformational ensembles, and mammalian translation.

{
  "$schema": "https://www.contextjamming.com/schemas/founder-context-v1.json",
  "file": "N°027",
  "persona": "Alex Rives, PhD",
  "archetype": "pi-bridge",
  "shape": "π",
  "one_line": "The Bitter Lesson is coming for proteins: scale unsupervised computation over evolutionary sequence and the model internalizes the physics of life, outperforming every handcrafted biological prior.",
  "cognitive_basis": {
    "credentialPath": "doctoral",
    "abstractionDirection": "top-down",
    "exitHorizon": "non-commercial",
    "moatInstinct": "theoretical-insight",
    "capitalPosture": "bootstrap-patron"
  },
  "operating_questions": [
    "Is this pipeline hard-coding a biological prior (MSAs, structural heuristics), or letting scale internalize the physics?",
    "What is the largest, most diverse evolutionary corpus we can train on, and does the metric keep scaling with it?",
    "Which 
  …

Filed by Bret Kerr · ACRA Insight LLC · Franklin, MA · Context Jamming Editorial System

CONTEXT JAMMING