Value Stories

Six stories across four groups. Each story is a test contract — RED before implementation, GREEN when value is delivered. The CONDUCTOR wires the factory.

Can the loop measure what it's optimising?

S1Hook

When

Creating a PRD and defining the North Star. Current metric-definer accepts prose formulas — no named data source, no threshold, no unit. The metric is uncheckable.

I need to

Metric-definer that enforces a computable formula with named data source, threshold, and unit at creation time.

So I get

PRD blocked if metric isn't queryable. Every new PRD enters with a measurable target. Currently: 0/7 Active PRDs have queryable metrics.

Not

Prose metric accepted without formula. Metric with no data source passes gate.

S2Action

When

Running measure-north-star for a PRD. The CLI namespace exists but returns nothing. An agent can't determine if its iteration improved or regressed the metric.

I need to

CLI command that returns a scalar metric value for any PRD with a queryable formula.

So I get

Agent runs measure-north-star --prd=sales-crm, gets a number. Currently: returns 'not implemented'.

Not

Command returns a number but it's hardcoded or cached, not computed from the named data source.

Do validated outcomes compound into the next cycle?

S3Hook

When

Engineering validates a PRD (RED to GREEN). proof-to-story writes a meta article but the actual number doesn't flow back to frontmatter where score-prds can read it.

I need to

proof-to-story writes actual_metric_value, validation_outcome, and pain_reduction_delta to PRD frontmatter.

So I get

Next score-prds run reads validated numbers from frontmatter. Proven demand compounds into priority.

Not

Fields written but score-prds doesn't read them. Numbers flow but don't influence ranking.

S4Action

When

Score-prds runs on a PRD with validation_outcome in frontmatter. Validated and unvalidated PRDs scored identically — evidence quality ignored.

I need to

Validated PRDs receive a confidence boost. Failed PRDs flagged for re-evaluation.

So I get

PRD with validation_outcome: pass scores higher than identical PRD without. Currently: both score the same.

Not

Boost applied but trivial. Failed PRDs not flagged. Validation status present but scoring formula unchanged.

Can the operator trust what happened overnight?

S5Hook

When

Agent session completes overnight. Results scattered across 20+ Comms messages. No aggregation. I wake to noise, not signal.

I need to

Session-experiment-logger reads Comms, aggregates deltas, writes one structured log per session.

So I get

Morning report is diff-readable — what changed, what improved, what regressed. One file per session.

Not

Log produced but just copies Comms messages verbatim. No aggregation, no delta computation.

Does one trigger run the full cycle?

S6Action

When

Operator triggers 'run loop' on top uncommissioned PRDs. Currently requires 7 separate skills manually in sequence.

I need to

Loop-orchestrator chains scaffold, activate, validate, measure, story with budget cap and regression halt.

So I get

One trigger, full cycle. Loop stops at budget. Halts on regression. Metric check before each iteration.

Not

Loop runs but skips metric check. Continues past budget. Ignores regressions. Produces false greens.

Kill Signal

Loop runs for 90 days with zero PRDs completing a full cycle. Or: metrics are defined but never queried.

Who

Platform operator — wants overnight sessions to produce structured experiment logs
PRD author — wants validated outcomes to automatically boost the next scoring cycle
Agent — wants to measure a PRD's North Star via CLI to know if iteration improved or regressed

Questions

What compounds faster — building new skills or wiring existing ones into a loop?

If every overnight session produced a structured experiment log, what would you learn by morning?
Which dormant algorithm would benefit most from 100 automated iterations?
Can LOOP-001 + LOOP-003 deliver value without the full orchestrator (LOOP-005)?