The Unix pipe,
for everything.
The Unix pipe (|) assumes everything is text. When you pipe binary media—an image, a video frame, an audio clip—the receiving program has no idea what it is looking at. MMPipe fixes this with a lightweight framing protocol so vision models return plain UTF-8 that grep, awk, and sed can consume exactly as Thompson intended.
Universal Media Pipe Protocol (UMPP)
Every UMPP stream begins with a signature header block that any downstream program can read instantly:
MMP/1.0 Content-Type: image/png Content-Length: 84321 <raw binary payload>
The magic bytes MMP/1.0\n let tools distinguish a framed media stream from raw bytes or plain text. No magic-byte sniffing required in the happy path (though analyze_vision gracefully falls back when you just cat a file).
How it works
| Step | Command | Role |
|---|---|---|
| 01 | mcat | Emits UMPP-framed binary to stdout |
| 02 | | | Standard Unix pipe — completely unchanged |
| 03 | analyze_vision | Reads frame, calls Gemini, prints UTF-8 text |
| 04 | grep / awk / sed | Consumes plain text exactly as before |
The multimodal complexity is fully contained inside analyze_vision. The boundary between programs stays plain text — the original Unix contract is restored.
Usage
mcat photo.jpg | analyze_vision
mcat diagram.png | analyze_vision "List every label visible in this diagram."
mcat screenshot.png | analyze_vision "What colors appear?" | grep -i red
cat photo.jpg | analyze_vision "What is this?"
Without GEMINI_API_KEY, analyze_vision runs in mock mode and emits a deterministic placeholder. Perfect for testing pipelines.
Three progressive enhancements
Native macOS file picker (osascript). Emits one MMP/1.0 envelope per selected file with an X-MMP-Source header. Supports multi-select.
Source once via shell/mmpick.sh
Non-destructive Kitty Graphics Protocol bridge. Renders an inline thumbnail and passes the original UMPP envelope downstream unchanged.
Works in Kitty; graceful text fallback elsewhere.
Full-screen Textual TUI. File browser + pipeline builder + live output. Keyboard-driven composition of complex media → text pipelines.
pip install textual
The Sacred Contract restored
All the fancy multimodal reasoning happens inside a single, well-behaved program. Everything on either side of the pipe continues to speak the only language Unix ever promised: plain text.
MMPipe is deliberately dependency-free in the core (Python stdlib only) and ships as a single executable script plus a handful of optional ergonomic bridges. It is the minimal, correct extension that makes the 1970s pipe model work in 2026.
git clone https://github.com/bretkerr/MMPipe.git cd MMPipe bash install.sh source ~/.zshrc # or ~/.bashrc
GEMINI_API_KEY (optional — mock mode works without it). No virtualenvs, no pip for the base experience.Imported from bretkerr/MMPipe. A pure expression of Unix philosophy applied to the multimodal era.