L0inner-loop

Automated Commissioning

When the feature matrix has 210 features and states drift within days of manual editing — feature states must be computed from test results, not hand-edited.

384

Priority Score

Pain × Demand × Edge × Trend × Conversion

Customer Journey

Why should I care?

Five cards that sell the dream

1Why

Measure, don't opine.

What would it take to compute every feature state from test output?

The friction: 210 features tracked. Zero computed from tests. Manual L-levels drift within days of editing. CRM claims L3 with no Story Contract.

The desire: Feature state = f(test results). Run the script, get the truth. FAVV Build Contracts now declare test artifacts — the data exists, the bridge doesn't.

The proof: The decision tree is deterministic at every node except L4. Everything below L4 is computable from test results.

The L-level decision tree

2Evidence

Zero false L3s.

Is under-reporting better than over-reporting?

The friction: A feature marked L3 that doesn't work destroys trust in the matrix. No verification exists. Safety Test violations aren't checked.

The desire: Conservative: Safety Test violations cap at L2. Regressions detected. Unmapped features flagged, not hidden.

The proof: Failure budget: zero false L3s. Under-report over over-report. Trust is binary — one false L3 and nobody trusts the matrix.

The trust stories

3Platform

Parser exists, bridge doesn't.

What's the minimum new code to close the loop?

The friction: PRD specs have FAVV tables. Vitest runs in CI. feature-matrix.json exists. Nothing connects them. FAVV parser reads Build Contracts but doesn't extract test mapping.

The desire: ~50% composition (extend existing), ~50% new code (index builder, level computer, writer, receipt generator).

The proof: 3 PRDs already have full Story+Build contracts. The mapping data exists in the specs — we just need to parse it.

The evidence flow

4Loop

Ten days, five sprints.

Can we validate with Identity & Access features first?

The friction: Manual commissioning doesn't scale past 50 features. We're at 210. Every new PRD adds more features to track manually.

The desire: Sprint 0: parser. Sprint 1: runner + computer. Sprint 2: writer + CLI. Sprint 3: CI integration. Sprint 4: Playwright e2e.

The proof: Sprint 1 acceptance: computed states for AUTH-* features match manual assessment. If they don't, the mapping is wrong.

The build order

5People

Commissioners verify, scripts compute.

Where does automation end and human judgment begin?

The friction: Commissioner opens feature-matrix.json, sees states from last manual edit. No idea if they're still true. 30+ min manual inspection, usually skipped.

The desire: L4 = commissioner sign-off. Everything below = computed. The builder and commissioner are never the same person.

The proof: Run one command, get a report. States reflect test results. Regressions flagged. Commissioner focuses on L3→L4, not L0→L3.

The commissioner stories

1 / 5

Same five positions. Different seat. The commissioner asks "can I trust it?" The builder asks "does it catch my regressions?"

Feature Dev Journey

How do we build this?

Five cards that sell the process

1Job

50/50 compose and build.

What pieces already exist?

18 build rows across 5 jobs. FAVV parser exists (extend). Vitest in CI (reuse). Playwright in Nx (reuse targets). ~50% composition, ~50% new.

The build order

1 / 5

Situation

210 features hand-edited in feature-matrix.json. States drift. CRM claims L3 with no Story Contract. Manual tracking hits ceiling at 50 features. Every PRD review requires manual state checks that take 30+ minutes and are usually skipped.

Intention

Feature states are a function of test results. Run the script, get the truth. feature-matrix.json updated by script within 5 minutes of merge to main. Every state change has an evidence trail.

Obstacle

No bridge connects PRD Build Contract artifacts to test execution. Test files don't declare which feature they verify. Three FAVV formats coexist. The mapping fidelity is the hardest thing — weak mapping produces states that look computed but are just as wrong as manual edits.

Hardest Thing

A weak mapping (wrong test files mapped to wrong features) produces states that look computed but are just as wrong as manual edits — except now with false confidence. Under-report over over-report.

Priority (5P)

4/5

Pain

3/5

Demand

4/5

Edge

4/5

Trend

2/5

Convert

Readiness (5R)

Principles3 / 5

Performance2 / 5

Platform3 / 5

Process2 / 5

Players2 / 5

What Exists

Component	State	Gap
PRD specs with FAVV Build Contracts	Working	3 PRDs have full Story+Build. Not parsed by any commissioning script.
feature-matrix.json	Working	Powers feature-matrix.jsx page. Hand-edited.
Vitest in engineering repo	Working	Runs in CI but not scoped to feature IDs.
FAVV parser in project-from-prd	Working	Reads Build Contracts for engineering tasks. Doesn't extract feature-to-test mapping.
Playwright in Nx monorepo	Working	Nx plugin installed, e2e targets inferred. Not connected to commissioning.
Agent receipt schema	Stub	v1.0 defined. No commissioning script emits receipts.

PRD	Contributes

Kill Signal

Script states contradict manual commissioner judgment on >20% of features after 3 runs.

Questions

What breaks first when the script disagrees with a human commissioner?

If 30% of features are unmapped, is the script's output trustworthy enough to replace manual edits?
Should the script refuse to write results when unmapped percentage exceeds a threshold?
Is the Safety Test gate (blocking L3) too conservative — or not conservative enough?