Skip to main content
L0inner-loop

Autoresearch Loop

When the priority stack has scored PRDs with Red tests waiting and an agent has overnight compute budget — the pain-to-proof cycle should run autonomously without seven separate human triggers.

1,200
Priority Score
Pain × Demand × Edge × Trend × Conversion
Customer Journey

Why should I care?

Five cards that sell the dream

1Why

Seven skills, zero wire.

What's the cost of seven skills that don't compound?

The friction: Pain-signal-extractor through proof-to-story exist across two repos. Every step requires a human trigger. The chain never runs.

The desire: One trigger that chains scaffold, activate, validate, measure, story. Overnight compute turns dormant PRDs into proven demand.

The proof: Karpathy's autoresearch validated the pattern at research scale. The gap is three wires, not a new architecture.

1 / 5

Same five positions. Different seat. The customer asks "will it run overnight?" The builder asks "will the morning report be trustworthy?"

Feature Dev Journey

How do we build this?

Five cards that sell the process

1Job

Activation over creation.

70% wiring, 30% new. What exists already?

7 skills built, 0 wired into a loop. The build contract is 6 rows — 3 skill upgrades, 1 new CLI command, 1 new skill, 1 orchestrator.

1 / 5
Situation

Seven skills span pain-to-proof across two repos but run in isolation. Every step requires a human trigger. Every North Star in the priority index reads 'Queryable: No'. The factory is designed but dormant.

Intention

One trigger runs scaffold, activate, validate, measure, story for the top uncommissioned PRDs. Metrics are queryable. Validated outcomes propagate back to frontmatter. One PRD completes a full pain-to-proof cycle per month.

Obstacle

Metrics aren't queryable (prose, not formulas). Validated outcomes don't propagate back to frontmatter. No conductor chains the skills. Trust gap: a bad loop that produces false greens is worse than no loop.

Hardest Thing

Making the loop trustworthy enough to run unsupervised. Budget caps, metric regression halts, and experiment logging are the safety rails — without them the loop runs fast in the wrong direction.

Priority (5P)

4/5
Pain
4/5
Demand
5/5
Edge
5/5
Trend
3/5
Convert

Readiness (5R)

Principles4 / 5
Performance2 / 5
Platform3 / 5
Process2 / 5
Players2 / 5

What Exists

ComponentStateGap
pain-signal-extractor skillWorkingExtracts pain from interviews. No chaining to next step.
create-prd skillWorkingCreates PRDs with 5P scoring. No queryable metric enforcement.
engineering-handoff skillWorking9-gate pre-flight. Gate 8 can't check queryable metrics.
proof-to-story skillWorkingWrites meta article. Doesn't propagate numbers to frontmatter.
score-prds skillWorkingScores by 5P. No confidence boost for validated PRDs.
measure-north-star CLIStubNamespace exists. No scalar measurement implemented.
session-experiment-loggerMissingNo overnight aggregation. Results scattered across Comms.
PRDContributes

Kill Signal

Loop runs for 90 days with zero PRDs completing a full cycle. Or: metrics are defined but never queried.

Questions

What compounds faster — building new skills or wiring existing ones into a loop?

  • If every overnight session produced a structured experiment log, what would you learn by morning?
  • Which dormant algorithm would benefit most from 100 automated iterations?
  • Can LOOP-001 + LOOP-003 deliver value without the full orchestrator (LOOP-005)?