Your deep research,
re-injected.
You spent hours feeding prompts to Gemini and Claude. The dense, well-structured output landed in Google Docs — paid for in cognitive labor, then shelved. Thought Molecules turns that Drive folder into a queryable, copy-pasteable corpus your future selves and future models can build on.
Thought Molecules
We call those dense research docs thought molecules: the cognitive building blocks of deep work. Reusable. Recombinable. Citable.
Each one cost real attention to make. Most of them will never be read again — not because they aren't valuable, but because there is no reliable way to surface the exact passage you need at the exact moment you need it.
The system points at a Google Drive folder, chunks by heading, embeds with Voyage, and indexes for hybrid (lexical + semantic) search. When you ask in natural language, an agent plans the retrieval, pulls candidates, reranks, verifies, and returns only verbatim passages from your own corpus — each one programmatically confirmed as a substring of the source before it ever leaves the process.
No paraphrase. No invention.
Every passage shipped is programmatically verified as a substring of the actual source chunk. If a phrase isn't in your docs, it doesn't appear in the output.
One fabricated quote breaks the trust forever. So the trust mechanism lives in code, not in vibes.
CLI · MCP · Web UI
pnpm cj query "..." --verbose --copy. Local-first. Your corpus, your machine, your keys. Re-ingest is incremental and effectively free.
Wire it into Claude Desktop or Claude Code. Exposes cj_query (verbatim cited markdown) and cj_status. Your research becomes a live tool for the model.
pnpm cj serve. Minimal textarea + results pane with streaming Server-Sent Events trace of every planner decision, tool call, and verifier step.
Front-Matter Provenance
Every research artifact carries a machine-readable YAML front-matter block declaring its exact lineage: the model, the literal prompt, generation time, human edits level, and parent documents that were injected.
The ingester parses it, stores it alongside the chunks, and makes it available at retrieval time for citation, audit, and (eventually) stronger chain-of-custody guarantees.
| Field | Type | Notes |
|---|---|---|
| schema_version | integer | Always 1. Lets the ingester branch on schema. |
| model | string | e.g. claude-sonnet-4-6, gemini-2.5-pro (lowercase, hyphenated). |
| model_version | string | Version stamp or date the model was current. |
| prompt | string | The literal first user message (multi-line OK with | block). |
| generated_at | ISO 8601 | UTC timestamp of generation. |
| human_edits | enum | none | light | substantive | rewritten | unknown |
| parent_doc_ids | string[] | Drive file IDs injected into the prompt. |
Every research artifact carries machine-readable provenance. The ingester never aborts — missing fields become unknown.
Imported from BretKerrAI/thought-molecules-rag. The RAG is designed for one user with a serious private research corpus. It is not a hosted SaaS.