Autoresearch Loop
When the priority stack has scored PRDs with Red tests waiting and an agent has overnight compute budget — the pain-to-proof cycle should run autonomously without seven separate human triggers.
Why should I care?
Five cards that sell the dream
Same five positions. Different seat. The customer asks "will it run overnight?" The builder asks "will the morning report be trustworthy?"
How do we build this?
Five cards that sell the process
Seven skills span pain-to-proof across two repos but run in isolation. Every step requires a human trigger. Every North Star in the priority index reads 'Queryable: No'. The factory is designed but dormant.
One trigger runs scaffold, activate, validate, measure, story for the top uncommissioned PRDs. Metrics are queryable. Validated outcomes propagate back to frontmatter. One PRD completes a full pain-to-proof cycle per month.
Metrics aren't queryable (prose, not formulas). Validated outcomes don't propagate back to frontmatter. No conductor chains the skills. Trust gap: a bad loop that produces false greens is worse than no loop.
Making the loop trustworthy enough to run unsupervised. Budget caps, metric regression halts, and experiment logging are the safety rails — without them the loop runs fast in the wrong direction.
Priority (5P)
Readiness (5R)
What Exists
| Component | State | Gap |
|---|---|---|
| pain-signal-extractor skill | Working | Extracts pain from interviews. No chaining to next step. |
| create-prd skill | Working | Creates PRDs with 5P scoring. No queryable metric enforcement. |
| engineering-handoff skill | Working | 9-gate pre-flight. Gate 8 can't check queryable metrics. |
| proof-to-story skill | Working | Writes meta article. Doesn't propagate numbers to frontmatter. |
| score-prds skill | Working | Scores by 5P. No confidence boost for validated PRDs. |
| measure-north-star CLI | Stub | Namespace exists. No scalar measurement implemented. |
| session-experiment-logger | Missing | No overnight aggregation. Results scattered across Comms. |
Kill Signal
Loop runs for 90 days with zero PRDs completing a full cycle. Or: metrics are defined but never queried.
Questions
What compounds faster — building new skills or wiring existing ones into a loop?
- If every overnight session produced a structured experiment log, what would you learn by morning?
- Which dormant algorithm would benefit most from 100 automated iterations?
- Can LOOP-001 + LOOP-003 deliver value without the full orchestrator (LOOP-005)?