The 100x multiplier in frontier AI no longer comes from buying faster chips in isolation; it comes from the organizational capacity to co-design hardware, low-level systems software, and model architecture as a single integrated system -- a reality proven by DeepSeek's Hopper-native MoE and Gemini's TPU lock-in. Anthropic has operationalized this thesis at scale through disciplined TPU partnerships and the fastest-growing enterprise coding agent in history, reaching $47B ARR with materially lower infrastructure burn than OpenAI, whose Jalapeno ASIC and Public Wealth Fund gambit represent a higher-risk, capital-heavy counter-bet on the same AGI timeline.
In the spring of 2026, while most observers still measured the AGI race by the number of H100s each lab could secure, Anthropic quietly crossed a threshold that inverted the prevailing hierarchy. Its annualized revenue run-rate hit $47 billion in May -- more than double OpenAI's $24 billion at the same moment -- achieved while spending roughly one-quarter as much on training compute. The gap was not luck or marketing. It was the first large-scale proof that the old model of siloed hardware procurement had hit its mathematical ceiling.
01The Co-Design Math
For years the industry operated in three separate rooms. Hardware teams designed general-purpose accelerators. Systems engineers wrote CUDA kernels and serving frameworks to talk to them. Researchers designed transformer variants in the abstract. Each group optimized for its own metrics. The result, as semiconductor analyst Dylan Patel of SemiAnalysis has demonstrated, was mathematically guaranteed diminishing returns. A 2x improvement in silicon, a 2x improvement in kernel efficiency, and a 2x improvement in model sparsity produced, at best, an 8x system-level gain. The 100x thesis states that when those three layers are co-designed -- when the model's expert dimensions are shaped to the exact tile sizes of the silicon, when the kernel's memory access patterns are written for the specific interconnect, when the architecture itself is tuned to the memory hierarchy -- the gains become multiplicative rather than additive. The compounding is non-linear because each layer removes friction the others would otherwise have to fight.
DeepSeek V3 and V4 provided the clearest public demonstration. The researchers explicitly sized their Mixture-of-Experts dimensions and routing patterns to match the matrix-multiply tile geometry and memory hierarchy of NVIDIA's Hopper architecture. The model ran with extraordinary efficiency on Hopper and later Blackwell. When the same weights were dropped onto Google's TPU v6e -- an objectively powerful accelerator -- performance collapsed. The communication patterns and tensor shapes were misaligned with the TPU's network topology. Conversely, Google's Gemini models are co-optimized for TPU interconnects and memory bandwidth; they lose efficiency when ported to NVIDIA silicon. The vaunted CUDA moat is being partially replaced by a deeper, more structural lock-in: the geometric structure of frontier model architectures themselves now binds customers to specific hardware families.
02The Compute Shortage Is Structural
This is happening against a structural compute shortage, not a cyclical one. The total addressable market of economically useful AI tasks -- what some analysts call dark GDP -- is expanding faster than the industry can build power-dense capacity. Even with 20 gigawatts of new data center capacity coming online in 2026 and over 30 gigawatts in 2027, demand remains unsatisfied. The mismatch is severe enough that terrestrial power constraints are already pushing serious planning toward space-based data centers, projected to dominate new deployments by 2040. NVIDIA's Jensen Huang has responded by deliberately arming neoclouds and specialized labs rather than letting hyperscalers consolidate all capacity, preserving a multi-polar market where performance and time-to-rack matter more than legacy tenant isolation advantages.
03Software Is Writing Its Own Infrastructure
The middle layer -- systems software -- has become the most dynamic. Triton, originally developed inside OpenAI, abstracts CUDA complexity while preserving explicit control over parallelism and memory. But the real acceleration is coming from AI itself. Systems such as Meta's KernelEvolve and Stanford's AutoKernel use large language models to profile PyTorch models, extract bottlenecks, generate candidate Triton or custom kernels, and subject them to rigorous verification: smoke tests, shape sweeps, numerical stability, determinism. What once took engineers weeks of trial-and-error now takes hours. On KernelBench, models like DeepSeek R1 have moved from 12% to 72% success rates on Level-1 tasks through iterative test-time compute. The industry is building the tools to automate its own infrastructure optimization at superhuman speed.
At the serving layer, SGLang has pushed the throughput-interactivity Pareto frontier with RadixCache, which treats the prompt as a stream and maintains an LRU KV-cache across calls, and Ragged Paged Attention. SGLang-Jax brings the same primitives natively to TPU via XLA. Daily automated sweeps on SemiAnalysis's InferenceX platform -- running across roughly 15 chip types and major frameworks -- now show that at high-interactivity operating points, AMD's Instinct MI355X on SGLang can deliver materially lower cost-per-token than NVIDIA's GB300 NVL72 under equivalent precision and without Multi-Token Prediction. The benchmark makes the 100x thesis visible in real time: the winner is not the fastest raw chip, but the organization that aligns model, serving framework, and silicon most tightly.
04Algorithmic Progress Compresses the AGI Timeline
While hardware-software co-design governs physical efficiency, algorithmic progress governs effective compute. Leopold Aschenbrenner's framework quantifies progress in Orders of Magnitude. The jump from GPT-2, or preschool, to GPT-4, a smart high-schooler, required roughly 4.5-6 OOMs of effective compute over four years. Physical compute contributed about 0.5 OOM per year. Algorithmic efficiency contributed another 0.5 OOM per year. Unhobbling -- Chain-of-Thought, RLHF, tool use, agentic scaffolding -- contributed about 2 OOMs. The next 3-6 OOMs of effective compute are now expected to reach AGI-level capability by 2027. The more profound shift is that models capable of autonomous research work will compress a decade of human algorithmic progress into a single year. Hundreds of millions of AI agents running parallel experiments at silicon speed will trigger an intelligence explosion that moves the bottleneck from algorithms to power generation and gigawatt-scale deployment.
05Two Paths to the Same Trillion-Dollar IPO
OpenAI and Anthropic have both concluded that reliance on general-purpose GPUs from hyperscalers is a strategic vulnerability. Their solutions diverge sharply. OpenAI partnered with Broadcom and Celestica to design Jalapeno, a captive inference-only ASIC optimized for the decode phase of autoregressive generation. The chip approaches the EUV reticle limit at about 840 mm2, uses a systolic array, and co-packages six to eight HBM3/HBM4 stacks directly on a silicon interposer. Development moved from initial design to tape-out in nine months -- accelerated by OpenAI's own models. Projected 50% lower inference cost of ownership. The trade-off is architectural rigidity: if future reasoning models move beyond the specific attention patterns hardcoded into Jalapeno, the chip becomes a sunk-cost anchor. Training remains entirely on NVIDIA GPUs.
Anthropic took the opposite bet. Rather than assume tape-out risk, it secured approximately 3.5 gigawatts of next-generation TPU capacity through an expanded Google and Broadcom partnership expected online starting 2027. A complex $35-45 billion financing structure has Google backstopping lease payments across five U.S. data centers. Anthropic can match workloads to the silicon best suited for them without nine-month hardware cycles, keeping capital focused on model scaling and commercial distribution. Broadcom sits at the center of both strategies -- co-developing Google's TPU v7, v8ax, and v9 roadmap, Meta's MTIA, and OpenAI's Jalapeno -- making it the quiet kingmaker of custom AI silicon.
The financial divergence is even starker. Anthropic moved from about $1 billion ARR at the end of 2024 to $47 billion by May 2026 -- a 47x multiple in under 18 months -- with 80-85% of revenue coming from high-margin enterprise and developer API usage. Its flagship Claude Code terminal-native agent reached $1 billion ARR within six months of launch, $2.5 billion by February 2026, and about $8 billion by May 2026, eventually accounting for an estimated 4% of all public GitHub commits globally. Anthropic's training costs are projected to peak at roughly $30 billion in 2028; OpenAI's compute spend is projected at $121 billion in the same year. Anthropic is already on the verge of its first profitable quarter. OpenAI remains deeply unprofitable on broken unit economics that Jalapeno is intended to repair.


