Value Stories

Nine stories across five groups. Each story is a test contract — RED before implementation, GREEN when value is delivered. The CONTROLLER verifies what shipped.

Does the matrix update itself?

S1Hook

When

PR merged to main. feature-matrix.json still shows states from last manual edit. No idea if they're still true. Have to open the app, check each feature, type the new level.

I need to

feature-matrix.json updates automatically within 5 minutes of merge. Only features whose mapped tests were touched have state changes.

So I get

Git blame shows script commit, not manual edit. No feature reaches L3 when its test results are failing.

Not

States change for untouched features. Feature marked L3 with failing tests.

S2Action

When

Commission script run completes. 60 of 210 features have no test file mapping. They silently sit at L0. Report says nothing about them.

I need to

Report shows unmapped count with specific feature IDs. No unmapped feature silently stays at L0 without a flag.

So I get

Script does not report 100% coverage when mappings are missing. Every unmapped feature visible in report.

Not

Unmapped features hidden. Coverage report lies.

Can I trust what the matrix says?

S3Hook

When

Safety Test assertion fails in test run. Feature still reaches L3 because only Success Test is checked.

I need to

Feature state capped at L2 regardless of Success Test results when Safety Test fails.

So I get

Safety Test failure logged with feature ID. Feature cannot reach L3 with active safety violation.

Not

Feature reaches L3 with active safety violation. Failures silently ignored.

S4Hook

When

Previously-passing test starts failing after a refactor. Feature state stays at L3 because states only go up, never down.

I need to

Feature state moves from L3 to L2 (or lower) in next commission run. Demotion logged with evidence.

So I get

States can go down. No one-way ratchet. Demotion logged with previous state, new state, and failing test file.

Not

State stays at L3 after tests break. Demotion applied silently without logging.

S8Action

When

Unit test with mocked DB passes. Commission script counts it as integration evidence. Feature reaches L3 without real infrastructure tested.

I need to

Unit tests (mocked DB/server) logged but not used for L-level state changes. Only integration and e2e results feed computation.

So I get

L3 means real infrastructure was tested, not mocks. Unit test exclusion does not affect unit test reporting elsewhere.

Not

Unit test with mocked DB counted as integration evidence. Hook test with stubbed API affects L-level.

Does the parser handle all PRD formats?

S5Action

When

PRD uses FFO format (6 columns) or FAVV v2.0. Parser only handles v2.1 (9 columns). Older PRDs silently skipped.

I need to

Parser extracts feature-to-test mappings from FFO, v2.0, and v2.1 formats.

So I get

v2.0 and FFO PRDs produce mappings. Partial results returned when table is malformed — no crash, log warning.

Not

Crash on FFO. Silent skip of v2.0. Only v2.1 produces mappings.

Does browser verification work?

S6Hook

When

Feature has e2e Playwright spec. Unit tests pass. But e2e spec fails because the UI is broken. Feature still shows L3.

I need to

Feature with passing unit tests but failing e2e is capped at L2. Playwright runs spec headlessly with trace + screenshot.

So I get

E2e results feed L-level computation alongside unit results. Traces archived as commission evidence.

Not

Feature reaches L3 from unit tests alone when e2e spec exists but fails. Playwright runs headed in CI.

S7Action

When

E2e spec uses page.route() to intercept real API calls with canned responses. Test passes but proves nothing about real behavior.

I need to

AST scan flags page.route() and page.fulfill() usage. Report lists counterfeit specs with file path and line number.

So I get

Mock-route specs not counted as passing evidence. Scanner does not modify spec files.

Not

Mock-route specs pass the scanner. Scanner false-positives on legitimate test setup.

Does the dashboard show commissioning health?

S9Hook

When

Commission script completes. Project dashboard in Plans UI shows no commissioning data. Have to read feature-matrix.json manually.

I need to

commissioning_results Convex table updated. listProjectsWithStats returns commissioningLevel per project.

So I get

Project with no commission run returns commissioningLevel: null — not 0. Dashboard reads from Convex, not filesystem.

Not

Dashboard shows commissioningLevel: 0 when no commission run occurred. Script writes to Convex in dry-run mode.

Kill Signal

Script states contradict manual commissioner judgment on >20% of features after 3 runs.

Who

Commissioner — verifies feature states after engineering ships. 30+ min manual inspection, usually skipped.
Engineering agent — needs matrix updated automatically after merge. Forgets to update JSON manually.
PRD author — needs unmapped report to fix missing Artifact paths in Build Contract.

Questions

What breaks first when the script disagrees with a human commissioner?

If 30% of features are unmapped, is the script's output trustworthy enough to replace manual edits?
Should the script refuse to write results when unmapped percentage exceeds a threshold?
Is the Safety Test gate (blocking L3) too conservative — or not conservative enough?