The Theory of Universal Cognitive Geometry — Context Jamming · Architectural Determinism

Architectural Determinism · Node Three · Panel 01 of 09

Twenty watts.

The human brain runs on roughly the power of a dim lightbulb. Less than one percent of its neurons fire at any given moment. This is not biology being clever. It is biology being forced.

Lennie · Current Biology · 2003

Field Notes · §I · Twenty watts.

Twenty watts.

The first time someone counted, properly, how many cortical neurons can fire at once, they were not trying to revolutionize anything. Peter Lennie, in 2003, was at NYU’s Center for Neural Science, working out an arithmetic problem nobody had quite finished. David Attwell and Simon Laughlin, two years earlier, had laid down the foundational energy budget — anatomic and physiologic data showing that action potentials and postsynaptic glutamate effects consumed about 81% of the grey-matter signaling budget.

From that ledger, Lennie did the multiplication. Take the brain’s roughly 20 watts of available glucose metabolism — a number that traces, with various refinements, to Sokoloff’s 1957 measurements. Subtract housekeeping. Subtract resting potentials. Apportion what remains across ~16 billion cortical neurons. Then ask: at the average firing rate that would consume the available energy, how many can be substantially active at once?

The answer was less than 1%. Not as a theoretical bound — as a budget constraint. The cortex is, by physical necessity, almost completely silent almost all the time.

This was not the “10% myth.” The 10% myth — that humans only “use” 10% of their brain — is wrong because all parts of the brain are used; what’s true is that at any instantaneous moment, only a small minority of neurons are firing. Lennie’s number is the rigorous version of the right intuition.

The brain is not efficient because it computes cleverly. It computes cleverly because it cannot afford anything else.

— Lennie · Current Biology · 2003 —

Architectural Determinism · Node Three · Panel 02 of 09

Lennie did the math in 2003.

1.8 spikes/sec/neuron averaged across cortex would burn more energy than the brain receives. 13 spikes would exceed whole-body metabolism. The cortex must be silent. There is no other option.

Lennie 2003 · Levy & Calvert PNAS 2021

Field Notes · §II · Lennie did the math in 2003.

Lennie did the math in 2003.

To sustain even 1.8 spikes per second per neuron averaged across human cortex would consume more energy than the entire brain receives. To sustain 13 spikes per second per neuron would exceed the metabolic output of the whole body. Sparseness is not a strategy among others — it is the only feasible regime.

Attwell and Laughlin’s earlier estimate (≤15% simultaneously active) and Lennie’s tightened “<1%” form a now-standard envelope cited across the field.

William Levy and Victoria Calvert at Virginia, in two PNAS papers in 2021, sharpened it further: of the 20-watt glucose envelope, only about 0.1 watts of ATP is spent on what an engineer would recognize as computation. Communication — getting spikes from one place to another — costs about 35 times more. The rest is the cellular cost of being alive.

The arithmetic is unforgiving: cortical neurons (~16 billion) × cost per spike × required firing rate = energy. Solve for required firing rate, with energy fixed at ~3.5 W of communication budget, and you get a number that is, by physical necessity, less than 1% of cells substantially active per moment. The cortex must be silent. There is no other option.

— Lennie 2003 · Levy & Calvert PNAS 2021 —

Architectural Determinism · Node Three · Panel 03 of 09

What the silence does.

Olshausen and Field gave a network natural images and one rule: reconstruct them, but stay sparse. Out fell the receptive fields of V1. The cortex is solving a compressed-sensing problem.

Olshausen & Field · Nature · 1996

Field Notes · §III · What the silence does.

What the silence does.

Five years before Lennie did the arithmetic, Bruno Olshausen and David Field had already shown what cortex does with the budget. Their 1996 Nature paper — “Emergence of simple-cell receptive field properties by learning a sparse code for natural images” — is one of the cleanest results in computational neuroscience. They wrote down an objective:

minimize ‖I − Σᵢ aᵢ φᵢ‖² + λ Σᵢ S(aᵢ)

where I is a natural image patch, φᵢ are basis functions to be learned, aᵢ are the activations, and S is a sparsity-promoting penalty. They trained an unsupervised network on patches of natural scenes, asked it to reconstruct the input while keeping the activations sparse, and out fell — without supervision, without labeling, without any prior knowledge about the visual cortex — receptive fields that looked exactly like the simple cells David Hubel and Torsten Wiesel had recorded in V1 thirty-five years earlier. Localized. Oriented. Bandpass. Wavelet-like.

In 2006, Emmanuel Candès, Justin Romberg, and Terence Tao formalized what made all of this work. Their compressed-sensing theorem says: if a signal x has a sparse representation in some basis (only s nonzero coefficients), and if a measurement matrix Φ satisfies the Restricted Isometry Property — meaning it is approximately an isometry on s-sparse vectors — then x can be exactly recovered from m ≈ s log(N/s) linear measurements by L1 minimization.

The Olshausen-Field objective is the dictionary-learning version of the compressed-sensing problem. Cortex, in this reading, is not metaphorically doing compressed sensing; it is literally solving the compressed-sensing problem, with biological hardware, under an energy constraint that prohibits anything denser.

— Olshausen & Field · Nature · 1996 —

Architectural Determinism · Node Three · Panel 04 of 09

Different brains. Same manifold.

Motor cortex during reaching: ~10 effective dimensions. Across individuals. Across species. Eighty million years of independent evolution converging on the same latent geometry.

Safaie et al. · Nature · 2023

Field Notes · §IV · Different brains. Same manifold.

Different brains. Same manifold.

Through the 2010s, a cluster of labs — Krishna Shenoy and Mark Churchland at Stanford and Columbia, Lee Miller and Sara Solla at Northwestern, Juan Gallego now at Imperial College, Carsen Stringer and Marius Pachitariu at Janelia — began routinely recording from hundreds and then thousands of neurons simultaneously. They asked a question Lennie’s energy arithmetic could not answer: when those few-percent-of-active neurons fire, what shape does their joint activity make in the high-dimensional space of all possible firing patterns?

The answer, repeated across motor cortex, prefrontal cortex, hippocampus, visual cortex, and striatum, is: a low-dimensional manifold. Gallego, Perich, Miller, and Solla’s 2017 Neuron review pulled the evidence together: motor-cortical population activity during reaching is well-described by ~8–12 “neural modes” — principal directions of co-modulation — that capture the bulk of behaviorally relevant variance.

In 2023, Mostafa Safaie, Joanna Chang, Lee Miller, Joshua Dudman, Matthew Perich, and Juan Gallego pushed further (Nature, 2023). They showed that these latent dynamics are preserved across individuals — and across species, in monkey and mouse motor cortex performing similar reaches. Different brains, idiosyncratically wired in ways that go all the way back to the lottery of development, nevertheless converge on the same low-dimensional latent geometry when they do the same thing.

The trajectories visibly land on the same surface — not side by side. ~80 million years of independent mammalian evolution, and the geometry is what you get either way.

— Safaie et al. · Nature · 2023 —

Architectural Determinism · Node Three · Panel 05 of 09

Now look at the AI.

1,100 trained networks. ~16 directions. Different teams, different data, different objectives. The networks did not know they were going to agree.

Kaushik et al. · arXiv 2512.05117 · Dec 2025

Field Notes · §V · Now look at the AI.

Now look at the AI.

“We show that deep neural networks trained across diverse tasks exhibit remarkably similar low-dimensional parametric subspaces… networks systematically converge to shared spectral subspaces regardless of initialization, task, or domain… universal subspaces capturing majority variance in just a few principal directions.”

That’s the abstract of Kaushik, Chaudhari, Vaidya, Chellappa, and Yuille’s December 2025 paper (arXiv 2512.05117). Five hundred Mistral-7B LoRAs. 500 Vision Transformers. 50 LLaMA-3 8B models. Trained on disjoint data, with different hyperparameters, by different people. They converge on a shared ~16-dimensional subspace.

Different monkeys, idiosyncratically wired, evolutionarily separated from mice for ~80 million years. Trained on natural reaching since infancy. They converge on a shared ~10-dimensional latent manifold. (Safaie et al., Nature, 2023.)

The brain’s manifold and the network’s manifold are landing in the same neighborhood. Not the same number — Kaushik measures parameter-space dimensionality (k≈16), motor cortex measures activation-space dimensionality (~10) — but the same order of magnitude, the same form: a few dozen dimensions where there could in principle be billions.

Two independent systems, on radically different hardware, converging on the same compressed geometry. The structural homology is the dossier.

— Kaushik et al. · arXiv 2512.05117 · Dec 2025 —

Architectural Determinism · Node Three · Panel 06 of 09

One theorem. Three instances.

Bekenstein-Hawking. Olshausen-Field. Kaushik. Three independent domains. Each one says the same thing: the information that survives is the information on the boundary.

Bekenstein 1973 · Maldacena 1997 · Kaushik 2025

Field Notes · §VI · One theorem. Three instances.

One theorem. Three instances.

The Bekenstein-Hawking bound says the entropy of any region of spacetime, in Planck units, is bounded above by one-quarter of its surface area, not its volume: S ≤ A/4. Information about a three-dimensional bulk is encoded on a two-dimensional boundary. This was sharpened by Maldacena’s AdS/CFT correspondence in 1997 (which, with Edward Witten’s 1998 elaboration, is the most-cited paper in theoretical physics): the full content of a (d+1)-dimensional gravitational theory in anti-de Sitter space is dual to a d-dimensional conformal field theory living on its boundary. The bulk has more apparent degrees of freedom than the boundary; the boundary determines all of them. The ambient dimensionality is not the intrinsic dimensionality.

Place this beside the cortical numbers. A V1 hypercolumn has ~10⁵ neurons in a square millimeter. The intrinsic dimensionality of natural-scene representation in mouse V1 is on the order of ~10–30 effective dimensions. A 7-billion-parameter transformer has, by Kaushik’s measurement, ~16 dimensions of weight-space variation that capture most of the variance across 1,100 trained models.

Three different physical systems — spacetime, cortex, transformer — each with apparent degrees of freedom in the billions or beyond, each obeying a constraint that drops the effective dimensionality by orders of magnitude. The structural-isomorphism caveat needs to be loud here. We are not claiming that AdS/CFT is the cortical manifold, or that gravity is sparse coding. The Bekenstein bound is a statement about quantum fields and event horizons; the Lennie bound is a statement about ATP and action potentials. They are different theorems with different constants of proportionality. What they share is the form: an effective dimensionality far below the apparent dimensionality, set by an external constraint. That shared form is what Architectural Determinism names.

— Bekenstein 1973 · Maldacena 1997 · Kaushik 2025 —

Architectural Determinism · Node Three · Panel 07 of 09

Tishby wrote the equation.

Compress representation. Preserve task-relevant information. Olshausen-Field, Candès-Donoho-Tao, Tishby, Friston — four expressions of the same compression theorem.

Tishby et al. 1999/2015 · Friston (FEP)

Field Notes · §VII · Tishby wrote the equation.

Tishby wrote the equation.

The mathematics underneath both observations is, in the cleanest formulation we have, Naftali Tishby’s information bottleneck (Tishby, Pereira, Bialek, 1999; Tishby & Zaslavsky, 2015; Shwartz-Ziv & Tishby, 2017). The objective is:

minimize I(X; Z) − β · I(Z; Y)

Compress the representation Z (minimize its mutual information with the input X) while preserving as much information as possible about the task-relevant variable Y. Tishby and Shwartz-Ziv argued, controversially but influentially, that deep networks during training pass through a “fitting” phase (where I(Z;Y) rises) followed by a “compression” phase (where I(X;Z) falls). The compression phase produces representations that lie on lower-dimensional manifolds — and that compression, in Tishby’s framing, is what generalization actually is.

Predictive coding, in the Rao-Ballard 1999 formulation and its Friston-style elaboration into the free energy principle, is the same objective in different costume. A hierarchical generative model ascending the cortex sends predictions downward; ascending feedforward signals carry the residuals — the surprise. Minimize free energy ≈ maximize sparseness of the surprise signal ≈ compress representation while preserving prediction accuracy.

So: Olshausen-Field sparse coding minimizes reconstruction error subject to a sparsity penalty. Compressed sensing recovers s-sparse signals from O(s log N) linear measurements. The information bottleneck minimizes I(X;Z) − β·I(Z;Y). The free energy principle minimizes variational free energy. These are not the same equation written four times — but they are four expressions of the same basic constraint: information that survives is the information that admits a low-dimensional, sparse representation aligned with what the system needs to do.

— Tishby et al. 1999/2015 · Friston (FEP) —

Architectural Determinism · Node Three · Panel 08 of 09

Half a billion years of replication.

If sparse, low-dimensional manifold coding were merely one viable strategy, evolution would have found alternatives. It has not. Every nervous system that scales beyond a few thousand neurons converges on the same architectural solution.

Suryanarayana et al. · Nature Ecology & Evolution · 2020

Field Notes · §VIII · Half a billion years of replication.

Half a billion years of replication.

The Maldacena-Afshordi node is mathematical and has the cleanness of theory. The Kaushik node is empirical but young — eight months old as of this writing, and the cleanest empirical statement of a phenomenon (universality of trained networks) that researchers had only sensed before.

The brain node is empirical and old. The basic vertebrate forebrain plan is at least 500 million years old, traceable to the lamprey lineage that diverged from our own before the Cambrian (Suryanarayana, Robertson, Wallén, & Grillner, Nature Ecology & Evolution, 2020). Sparse coding and low-dimensional manifold organization are not artifacts of human cortex; they appear in fly olfactory mushroom bodies, in songbird HVC, in mouse and monkey and human motor cortex. The basal ganglia have been doing this for ~535 million years.

If sparse, low-dimensional, manifold-constrained representation were merely one viable strategy, evolution would have explored alternatives. It has not. Every nervous system that scales beyond a few thousand neurons converges on the same architectural solution.

This is the strongest available evidence that the geometry is not contingent. It is what the underlying compression theorem makes available; it is what the energy constraint forces; it is what evolution rediscovers every time. When Kaushik shows that 1,100 transformers converge on a ~16-dimensional weight subspace, he is reproducing — in eight months of GPU training across several research labs — what vertebrate cortex has been reproducing across half a billion years of independent evolutionary lineages.

That is what makes brain sparse coding the third anchor. Physics gives the math. Kaushik gives the cleanest contemporary AI experiment. Cortex gives the longest-running empirical replication study in the universe.

— Suryanarayana et al. · Nature Ecology & Evolution · 2020 —

Architectural Determinism · Node Three · Panel 09 of 09

Architectural Determinism. Node Three.

bretkerr.substack.com · The geometry was always going to win.

Companion piece to "Sixteen Directions"

Field Notes · §IX · Architectural Determinism. Node Three.

Architectural Determinism. Node Three.

The thesis is not that gravity is the brain. The thesis is that there exists a compression theorem — articulable as the joint of Bekenstein’s holographic bound, Candès-Donoho-Tao compressed sensing, Tishby’s information bottleneck, and the Olshausen-Field sparse-coding objective — and that this theorem is the load-bearing structural constraint operating in physics, in trained neural networks, and in biological cortex.

Each domain instantiates it under its own boundary conditions: surface area in gravity, optimization landscape in deep learning, ATP budget in biology. The instances are not metaphysically identical. They are structurally isomorphic, and the isomorphism is what makes Architectural Determinism a thesis rather than an analogy.

When Hassabis says “lower dimensional manifold… maybe true of most of reality,” he is — perhaps without noting it — converging on the same point Olshausen and Field made about V1 in 1996, the same point Maldacena made about anti-de Sitter space in 1997, and the same point Kaushik et al. made about transformer weights in December 2025. The convergence is the news.

Node Three, ready for transmission.

— Companion piece to "Sixteen Directions" —

CONTEXT JAMMING

Twenty watts.

Twenty watts.

Lennie did the math in 2003.

Lennie did the math in 2003.

What the silence does.

What the silence does.

Different brains. Same manifold.

Different brains. Same manifold.

Now look at the AI.

Now look at the AI.

One theorem. Three instances.

One theorem. Three instances.

Tishby wrote the equation.

Tishby wrote the equation.

Half a billion years of replication.

Half a billion years of replication.

Architectural Determinism. Node Three.

Architectural Determinism. Node Three.

The Ledger.

How this site is made.

Claude Code 4.7

Claude Opus 4.6

Gemini 3.1 Pro