# moea-labeler > Apache License 2.0 · Copyright 2026 ACRA Insight LLC > > Licensed under the Apache License, Version 2.0 (the "License"); > you may not use this file except in compliance with the License. > You may obtain a copy of the License at > > https://www.apache.org/licenses/LICENSE-2.0 --- ## Skill: moea-labeler **One-line description:** Given a domain and labeling objective, emit an XML deep-research prompt with LABEL-ANCHOR semantic anchors and a downstream JSONL labeling schema with validation gate — Stage 1 of the MoEA (Mixture of Expert Agents) preference-data pipeline. **Version:** 1.0.0 **Author:** Bret Kerr · ACRA Insight LLC **License:** Apache 2.0 **Compatible with:** Claude Code · Cursor · Copilot · Gemini CLI (any skills-compatible agent) --- ## What it does `moea-labeler` composes an XML deep-research prompt optimized for Gemini Deep Research (or equivalent). The prompt embeds **LABEL-ANCHOR** semantic anchors — one per preference dimension — that serve as the deterministic join key between a retrieval-grounded, cited research finding and the preference score assigned to it downstream. The anchor id (`domain.dimension-slug.NN`) travels with every finding through the pipeline, making each human or synthetic preference label **auditable** — traceable to a specific cited source rather than floating as an unsupported preference signal. --- ## Pipeline ``` Domain + labeling objective ↓ [moea-labeler] ← Stage 1 (Claude Code) ↓ XML deep-research prompt w/ LABEL-ANCHORs → Gemini Deep Research ↓ Anchored, cited findings ↓ [preference-labeling handoff] → DPO-ready JSONL ← Stage 2 ↓ Validation gate (κ ≥ 0.6 vs human gold set) → ship or escalate ``` --- ## Invocation ``` /moea-labeler ``` The skill prompts for: 1. **Domain** — e.g. "Clinical Summarization", "Contract-Clause Review", "Code Review", "Financial-Filing QA", or a custom domain. 2. **Labeling objective** — one sentence describing what model behavior this preference data will improve. 3. **Preference dimensions** — a list of dimensions with criteria a domain expert could evaluate in under 30 seconds. 4. **Gold-set fraction** — what percentage of pairs to route to human expert review (default: 10%). 5. **Judge ensemble size** — number of LLM judge families (1 / 3 / 5; default: 3). --- ## Output format ### Stage 1 — XML prompt with LABEL-ANCHORs ```xml {objective} Ground every claim in a retrievable, citable primary source. No source, no claim. Attach the matching LABEL-ANCHOR id to every finding so it ports to the labeling stage. For each anchored finding, emit a one-line rationale a domain expert could audit in under 30 seconds. For each prompt, emit a chosen/rejected pair scored on the anchored dimensions, with the supporting LABEL-ANCHOR ids and sources carried through. JSONL: {"prompt","chosen","rejected","anchors":[...],"sources":[...],"rationale","dimension","tier"} {gold_fraction} 0.60 Any pair whose dimension tier is "escalate", or where ensemble judges disagree, routes to the human gold set. ``` ### Stage 2 — DPO-ready JSONL (preference-labeling handoff) Each row in the output JSONL carries its anchor ids and sources through, making the preference label auditable end-to-end: ```jsonl {"prompt":"...","chosen":"...","rejected":"...","anchors":["domain.dim.01"],"sources":["Primary source URL"],"rationale":"One-line expert-auditable rationale.","dimension":"Dimension Name","tier":"screen|escalate"} ``` --- ## Dimension tiers | Tier | Meaning | Routing | |------|---------|---------| | `screen` | Machine-decidable with high confidence | LLM ensemble only | | `escalate` | Requires domain expert judgment | Routes to human gold set | --- ## Validation gate - **κ ≥ 0.60** Cohen's / Fleiss' kappa against held-out expert set (minimum threshold) - **Position-swap debiasing** applied to all ensemble judges - Any `escalate`-tier pair or ensemble disagreement → human gold set - If κ < 0.60: raise gold fraction (toward 15–20%) or narrow dimension scope --- ## Honest limits 1. **Expert ceiling:** LLM judges agree with SMEs at 64–68% (Szymanski et al., IUI 2025). Do not use for unsupervised high-stakes labeling. 2. **Model collapse risk:** Apply labels to human-authored or retrieval-grounded content only (Shumailov et al., Nature 2024). 3. **Reward hacking:** Multi-model ensembles can worsen DPO safety if not benchmarked first (arXiv 2504.02193). 4. **Regulated domains:** Medical / legal use requires human-in-the-loop by law. Position as pre-screening only. --- ## Compatibility - **Gemini Deep Research** (Stage 1 execution) — paste the XML prompt directly - **Red Hat InstructLab** — MoEA-labeler is a preference-data layer; InstructLab handles instruction/knowledge. They compose. - **IBM Granite via Ollama** — local compression and validation runtime (open source, no data egress) - **Google for Startups / Open Accelerator** — Gemini API credits supported --- ## Live interactive demo [contextjamming.com/MoEA-labeler](https://www.contextjamming.com/MoEA-labeler) — compose, preview, and download without an API key. --- ## License Apache License, Version 2.0. See full text at https://www.apache.org/licenses/LICENSE-2.0