Skip to main content
L0inner-loop

Automated Commissioning

When the feature matrix has 210 features and states drift within days of manual editing — feature states must be computed from test results, not hand-edited.

384
Priority Score
Pain × Demand × Edge × Trend × Conversion
Customer Journey

Why should I care?

Five cards that sell the dream

1Why

Measure, don't opine.

What would it take to compute every feature state from test output?

The friction: 210 features tracked. Zero computed from tests. Manual L-levels drift within days of editing. CRM claims L3 with no Story Contract.

The desire: Feature state = f(test results). Run the script, get the truth. FAVV Build Contracts now declare test artifacts — the data exists, the bridge doesn't.

The proof: The decision tree is deterministic at every node except L4. Everything below L4 is computable from test results.

1 / 5

Same five positions. Different seat. The commissioner asks "can I trust it?" The builder asks "does it catch my regressions?"

Feature Dev Journey

How do we build this?

Five cards that sell the process

1Job

50/50 compose and build.

What pieces already exist?

18 build rows across 5 jobs. FAVV parser exists (extend). Vitest in CI (reuse). Playwright in Nx (reuse targets). ~50% composition, ~50% new.

1 / 5
Situation

210 features hand-edited in feature-matrix.json. States drift. CRM claims L3 with no Story Contract. Manual tracking hits ceiling at 50 features. Every PRD review requires manual state checks that take 30+ minutes and are usually skipped.

Intention

Feature states are a function of test results. Run the script, get the truth. feature-matrix.json updated by script within 5 minutes of merge to main. Every state change has an evidence trail.

Obstacle

No bridge connects PRD Build Contract artifacts to test execution. Test files don't declare which feature they verify. Three FAVV formats coexist. The mapping fidelity is the hardest thing — weak mapping produces states that look computed but are just as wrong as manual edits.

Hardest Thing

A weak mapping (wrong test files mapped to wrong features) produces states that look computed but are just as wrong as manual edits — except now with false confidence. Under-report over over-report.

Priority (5P)

4/5
Pain
3/5
Demand
4/5
Edge
4/5
Trend
2/5
Convert

Readiness (5R)

Principles3 / 5
Performance2 / 5
Platform3 / 5
Process2 / 5
Players2 / 5

What Exists

ComponentStateGap
PRD specs with FAVV Build ContractsWorking3 PRDs have full Story+Build. Not parsed by any commissioning script.
feature-matrix.jsonWorkingPowers feature-matrix.jsx page. Hand-edited.
Vitest in engineering repoWorkingRuns in CI but not scoped to feature IDs.
FAVV parser in project-from-prdWorkingReads Build Contracts for engineering tasks. Doesn't extract feature-to-test mapping.
Playwright in Nx monorepoWorkingNx plugin installed, e2e targets inferred. Not connected to commissioning.
Agent receipt schemaStubv1.0 defined. No commissioning script emits receipts.
PRDContributes

Kill Signal

Script states contradict manual commissioner judgment on >20% of features after 3 runs.

Questions

What breaks first when the script disagrees with a human commissioner?

  • If 30% of features are unmapped, is the script's output trustworthy enough to replace manual edits?
  • Should the script refuse to write results when unmapped percentage exceeds a threshold?
  • Is the Safety Test gate (blocking L3) too conservative — or not conservative enough?