←
Context Jamming  /  Vol. 26  ·  Dispatch №004

Context Jamming·Vol. 26 · Dispatch·ACRA Insight LLC

DISPATCHES · ISSUE №004 · THE DETERMINISM TEST · LIVE

Two Labs. One Prompt.Different Architectures, Different Answers.

Gemini 3.1 Pro and Claude Opus 4.7 were handed the same prompt on AI interpretability. Gemini produced a 43-source synthesis in taxonomic voice. Claude spent 48 minutes, cited 2,501 sources, and named the Architectural Determinism thesis by name. A live, side-by-side test of whether model architecture determines the shape of the answer.

Bret Kerr·ACRA Insight · Franklin, MA·18 April 2026·14 MIN · META-ANALYSIS + PRIMARY SOURCES

Two labs. One prompt. Two different architectures. Two different answers. On the morning of April 17th, Uri Maoz published a piece in MIT Technology Reviewarguing that “humans in the loop” in AI-enabled warfare is a civilizational illusion. I fed the article into two deep-research agents — Gemini 3.1 Pro and Claude Opus 4.7 — with an identical prompt asking for a cross-domain interpretability analysis. What came back was not two variations on the same report. What came back was two reports that do not agree about what a report on this topic is.

Gemini produced a taxonomic, top-down synthesis: a 43-source bibliography, a clean four-domain comparison table, numbered sections, authoritative prose. Claude spent forty-eight minutes and one second consulting 2,501 sources, and produced an argumentative, self-critical, inline-cited document that disagreed with its own premise in the first paragraph, named the DeepMind pivot away from sparse autoencoders as the most important underreported event in the field, and — most strikingly — named the Architectural Determinism thesis by name, treating this very experiment as a real-world test of whether a lab’s founding physics structurally shapes its deployment decisions.

This dispatch is the meta-analysis of that divergence. The full reports are in the 3-tab viewer below, verbatim, alongside the prompt both models were given. Read them in either order. They are both correct. They are answering different questions from the same text.

§ 01

The methodological split.

Gemini reasons top-down: it imposes a taxonomy, populates it with instances, and presents the taxonomy as the finding. The opening move is definitional (“the intention gap is a structural, cross-domain failure mode”) and the report structure is the argument. This is the shape of a well-produced research-firm deliverable: clean, orderly, hard to argue with sentence-by-sentence, and optimized for the reader who needs a framework they can immediately use.

Claude reasons bottom-up: it opens by undermining Maoz’s own framing (“Maoz’s argument should be read, against its own framing, as a symptom rather than a warfare-specific claim”), and spends the document re-scoping the question to what the primary-source evidence can actually support. It pushes back on interdisciplinarity as a catch-all fix, reformulates it as Galison’s specific “trading zone” institutional form, and flags where Maoz’s own empirical neuroscience cuts against his op-ed’s policy claim. This is the shape of an essay, not a deliverable.

Neither approach is wrong. They are the signatures of two different inductive biases, built into the models by the training choices of two different labs. The question the rest of this dispatch asks is whether those biases are architecturally determined— which is to say, whether they follow from the physics-shaped information-theoretic framing Anthropic inherited from Kaplan’s scaling-law work, and from whatever the analogous deep prior is inside Google DeepMind.

§ 02

The failure modes diverge, too.

A side-by-side failure analysis is where the divergence stops being aesthetic and starts being structural.

Gemini's failure mode: false closure.

The taxonomy flatters itself. When the evidence for a cell in the four-domain table is weak, Gemini still fills the cell — sometimes with a plausible-sounding generalization that the primary source doesn't actually support. The report reads cleanly but occasionally arrives at symmetries that are produced by the table's shape, not the world's.

Claude's failure mode: refusal of closure.

The argument sometimes refuses to land. Claude pushes back on its own framing for long enough that the reader loses the thread. The self-critical register can collapse into the genre where every claim has three caveats and the policy implication is never stated crisply. A decision-maker reading only Claude gets heat and no signal on what to actually do.

Gemini's strength: transfer.

If you need a framework to walk into a meeting with, Gemini hands you the framework. The four-domain table is ready to paste. The three non-obvious findings are numbered. The prose is board-ready. This is capability Anthropic's model does not reliably produce at the same velocity.

Claude's strength: epistemic register.

When Claude writes "Anthropic's own authors disclose that attribution graphs work for about a quarter of the prompts we've tried" — directly quoting a limitation from the primary source — that's a move Gemini's taxonomic frame cannot make without breaking the frame. The epistemic register is where Claude is uniquely good.

§ 03

43 sources vs. 2,501.

The citation count gap — 43 sources vs. 2,501 sources consulted — is not a quantity-vs.-quality argument. It is evidence of two different conceptions of what a source is for.

Gemini’s 43 sources are bibliographic: they anchor the report’s legitimacy. They appear at the end. Their job is to prove the framework rests on something. The reader is not expected to follow any given citation; the reader is expected to be reassuredby the list’s weight.

Claude’s 2,501 sources are argumentative: they are the load-bearing substance of individual sentences. Specific arXiv IDs, specific papers by specific authors, quoted or named at the point of use. The inline citation chips (“[LessWrong]”, “[arXiv]”, “[Transformer Circuits]”) are there because the claim would not otherwise be defensible. A reader who drops any single source finds that paragraph wobble.

§ 04

The two sentences that prove the thesis.

Everything above is texture. Here is the demonstration. One sentence from each report on the same underlying question: what is the mechanism that makes the interpretability gap persist?

The Gemini sentence is true. It is also the kind of sentence that has been true for the last seventy years of strategic-studies writing about arms races. It is framework-level.

The Claude sentence is specific. It names the paper, the three comparative statics, and the non-obvious one — that more information makes race dynamics worse, because leaders use transparency to shave precaution to the minimum that preserves their lead. This is the operational claim. It tells you what to do: verification regimes that reveal compliance without revealing exploitable capability.

One is the headline. One is the mechanism. They come from the same prompt. The difference is not in what the models know. It is in what they are built to do with what they know.

§ 05

The DeepMind SAE pivot that Gemini didn't surface.

The single most important piece of news in Claude’s report does not appear in Gemini’s. From the Claude run:

This is sourced to Neel Nanda’s team’s “Pragmatic Vision for Interpretability” post on the Alignment Forum, which quotes an internal admission that “SAEs underperform much simpler & cheaper techniques.” Three frontier labs are visibly making different bets on the same tool at the same moment: Anthropic is still on SAEs-plus-attribution-graphs, OpenAI used Goodfire’s SAEs rather than its own for its 2025 misalignment work, and DeepMind has quietly walked away from SAEs as a primary research direction. The popular narrative of monotonic interpretability progress obscures what is actually a paradigm split with real technical disagreement.

Gemini’s report covers sparse autoencoders cleanly — it cites Gemma Scope 2 as DeepMind’s contribution, frames SAEs as “the most significant technical breakthrough of the 2024–2025 period,” and moves on. It is factually accurate. It is also the view that would be accurate if DeepMind were still committed to the program. The report misses the pivot.

Why does Claude catch it and Gemini miss it? One honest answer: Claude is produced by a lab where interpretability is the founding identity, and its research mode is tuned to surface internal methodological disputes within the interpretability field. Gemini’s research mode is tuned to produce clean summaries of what the literature says, not to weight internal methodological disputes over the surface consensus. On a meta-interpretability prompt, one of those tunings catches the news and the other produces the literature review. That is what architectural determinism means when it leaves the whiteboard.

§ 06

What this means for the rest of us.

Three takeaways, ordered by how much they should change your behavior as a serious user of these tools.

Do not trust a single model for meta-questions.

On questions about the shape of a field — what it is arguing about, where the methodological fault lines are, which pivots are underreported — run the question through more than one model. The divergence is not noise. It is signal about which blind spots each model has. For operational questions with crisp answers, either works. For questions about the field itself, the disagreement is the finding.

Read the register, not just the content.

When a model says "the report proves X," ask whether the register allows the model to say "actually, X is contested, and here are the three researchers pushing back." If the answer is no, you have the headline but not the mechanism. Claude's self-critical register is doing real work that Gemini's taxonomic register structurally cannot do. The inverse is also true.

The divergence itself is a tool.

Two reports on the same prompt, read side by side, let you triangulate the source material in a way neither report alone does. Use the models adversarially. The prompt that produced these two outputs is the optimized research workflow as of April 2026: ask both, read both, let the disagreement tell you where to dig.

§ · The Primary Sources

Toggle between tabs to compare

The Optimized Prompt

The exact research brief fed into both systems. Same words, same attachments, same instructions.

The Optimized Prompt

Fed identically into: Gemini 3.1 Pro (Deep Research) · Claude Opus 4.7 (Research) Date: April 2026 Anchor text: Uri Maoz, "Why having 'humans in the loop' in an AI war is an illusion," MIT Technology Review, April 16 2026.

Note: The full prompt will be posted here verbatim. The version below is a structural reconstruction based on what both reports actually covered, pending the original text being pasted in.

The framing

Read the attached MIT Technology Review piece by Uri Maoz on the "humans in the loop" illusion in AI-enabled warfare.

Then produce an exhaustive deep-tier research report that treats the interpretability gap Maoz describes not as a warfare-specific claim, but as a cross-domain civilizational signature — a structural failure mode appearing simultaneously across every frontier of advanced applied science in the 2024–2026 window.

What the report must cover

Four-domain comparison. Build a comparison matrix across (a) AI warfare, (b) synthetic biology / mirror life, (c) AI-assisted software engineering, and (d) robotics foundation models. For each domain, specify the opaque system deployed, what the black box hides, the competitive pressure forcing deployment, the proposed legibility fix, and the current state of that fix as of Q2 2026.
Mechanistic interpretability landscape (2024–2026). Inventory the live research frontiers — Anthropic's Petri, sparse autoencoders, attribution graphs, Transluce's Monitor/Docent/Investigator agents, Goodfire's Ember, DeepMind's Gemma Scope, OpenAI's alignment-auditing work. Name the hard walls: faithfulness (Makelov/Lange/Nanda, SAEBench), identifiability (Zhang et al.), deception-robustness (Apollo Research's scheming evaluations), weak-to-strong generalization bounds. Be candid about the gap between existence proofs and production-grade runtime monitoring.
Maoz's research program. Situate ai-intentions.org and Maoz's LUCID Lab at Chapman. Trace the two-pronged methodology (top-down cognitive neuroscience + bottom-up mechanistic interpretability). Engage with Maoz's own empirical work on the readiness potential, including whether his findings complicate the classical picture his op-ed implicitly rests on.
Competitive pressure as mechanism. Connect to the formal game-theoretic literature: Armstrong/Bostrom/Shulman 2016 "Racing to the Precipice," Schelling's "premium on haste," Cunningham's technical-debt metaphor, Scott Alexander's "Moloch." Include the February–March 2026 Anthropic-Pentagon legal collision (Hegseth supply-chain-risk designation, Judge Lin's injunction, 9th Circuit stay denial) as a live case study.
Escape conditions. Verification, common knowledge of shared catastrophic downside, mutual vulnerability, reversibility. Contrast 1975 Asilomar (all four conditions held) with 2024–2026 frontier AI governance (none fully met).
Interdisciplinarity and discipline boundaries. Distinguish general interdisciplinarity (largely unsuccessful: FHI, MIRI's decision theory, SFI's "failed institutionalization") from Galison's specific "trading zone" institutional form (Macy Conferences, particle physics' pidgins and creoles, Esvelt's mirror-life call-around, the Cooperative AI Foundation, CHAI). Apply the distinction to Maoz's consortium.
Three non-obvious findings to seed future writing. Surface claims that are well-supported by the evidence but underreported — the kind of thing a careful reader would not know from general tech coverage.
Honest answer to the core question. Is the interpretability gap a coincidental shared feature of 2024–2026 tech frontiers, or the defining feature of this moment? Weigh the evidence without predetermination.

Constraints

Cite specifically. Name papers, authors, arXiv IDs, dates, publication venues. Use inline citation markers.
Don't paper over hard walls. The field is pre-paradigmatic in meaningful ways (Casper, Saphra, Rudin, Williams et al.). Say so.
Distinguish existence proofs from mechanistic closure. Anthropic's own authors disclose attribution graphs work for "about a quarter of the prompts we've tried." Quote that register.
No hedging for the sake of balance. If the evidence points to "the defining feature of the era," say that; if it only supports "one of several simultaneous pressures," say that.
Include a full bibliography organized by domain at the end.

Meta-instruction

Write the report as if your job depended on it being read by Chris Olah, Dario Amodei, Neel Nanda, and Stuart Russell — and on them finding nothing to object to in the framing of their own fields.

(This is the reconstructed frame. The original prompt — with its exact phrasing, any attached PDFs of the source article, and any tuning instructions specific to each model — will replace this file when uploaded. The meta-analysis on this page compares what the two models produced when fed this same request.)

Field Notes · Issue №001

On Chris Olah.

If the Architectural Determinism thesis is right, Olah’s career is the load-bearing evidence. The shape of Anthropic’s interpretability program — and the shape of Claude’s output above — is the shape of the research trajectory of a single researcher, carried into an organization that was founded around it.

2013
Early visualization work
At Google Brain, publishing on distill.pub's precursors. 'Neural Networks, Manifolds, and Topology' — the essay that framed the field's visual vocabulary for the next decade.
2015
Distill.pub founded
Co-founds Distill with Shan Carter — a journal built around reproducible, interactive explanations of machine learning. The institutional form of 'showing the work' rather than publishing a PDF.
2018
Circuits program at OpenAI
Launches the 'Circuits' thread, treating neural networks as reverse-engineerable artifacts. The methodological pivot: from 'what do networks learn?' to 'what are the specific algorithms inside?'
2021
Anthropic co-founder
Leaves OpenAI with Dario and Daniela Amodei to co-found Anthropic. Interpretability moves from research program to organizational identity.
2024
Scaling Monosemanticity
The Templeton et al. paper. Sparse autoencoders at Claude 3 Sonnet scale. The proof-of-concept that mechanistic interpretability could run on production frontier models, not just toy networks.
2025
"On the Biology of a Large Language Model"
With Lindsey, Gurnee, Ameisen et al. Attribution graphs on Claude 3.5 Haiku. Plans-before-rhymes forward planning in poetry. Shared language-of-thought across English/French/Chinese. The canonical demonstration of circuit tracing at scale — and the admission that it works for 'about a quarter of the prompts we've tried.'
2025
Petri · Alignment Auditing Agents
The auditor-AI architecture Maoz's MIT TR piece proposes as the fix — built, deployed, tested against UK AISI, published. The tool exists as a pre-deployment prototype; it does not yet exist as a runtime monitor on any user-facing production model.

The piece Claude produced above cites Olah’s lineage directly — Transformer Circuits, the Biology paper, Petri, the “about a quarter of the prompts” admission. The piece Gemini produced cites “the Anthropic interpretability team” collectively. Same facts. Different proprioception.

Filed from the determinism test

— Bret Kerr

Context Jamming is a dispatch from ACRA Insight LLC on cross-model orchestration, AI safety, and the economics of the new cognitive stack.
GemClaw  ·  The Determinism Test  ·  Live  ·  Primary-Source Open
Subscribe at contextjamming.substack.com
← All dispatchesRead Dispatch №003: The GEO Playbook →