Flow Diagrams
Four maps. The CONTROLLER verifies what the factory shipped. Feature state = f(test results).
What does success look like?
Five binary outcomes. Each is a pass/fail test.
OUTCOME MAP — Automated Commissioning ═══════════════════════════════════════════════════════════════ Outcome Measure Current ────────────────────────────── ─────────────────────────────────── ─────── Feature state computed Git blame shows script, not human FAIL Same input → same output diff of two runs = empty FAIL Missing mappings visible Unmapped features in report FAIL Safety violations block L3 safety_fail → max(state, L2) FAIL Stale states detected State moves DOWN on regression FAIL
Outcomes name what we're measuring. The evidence flow shows how data moves.
How does evidence flow?
PRD specs in, computed states out.
EVIDENCE FLOW — From PRD Spec to Feature Matrix
═══════════════════════════════════════════════════════════════
PRD spec/index.md → FAVV parser extracts feature_id → test_file[]
↓
Engineering repo tests → Vitest runs scoped to mapped files
↓
→ Playwright runs e2e specs (UI features)
↓
JSON reporter output → L-level computer maps results to feature IDs
↓
→ Mock detector flags counterfeit specs
↓
feature-matrix.json ← Writer updates state + updated columns
↓
.ai/receipts/ ← Receipt generator logs evidence trail
↓
Convex commissioning_results ← Sync to project dashboardThe evidence flow shows data movement. The decision tree shows the logic.
How are L-levels computed?
Deterministic at every node except L4. L4 requires a human commissioner.
L-LEVEL DECISION TREE
═══════════════════════════════════════════════════════════════
Has PRD with FAVV Build Contract?
NO → L0 (Spec only)
YES → Has schema/type in engineering repo?
NO → L0
YES → Has UI page/component using schema?
NO → L1 (Schema)
YES → All mapped tests pass?
NO → L2 (UI exists, tests fail)
YES → Any Safety Test violations?
YES → L2 (safety blocks L3)
NO → Any mock routes in e2e specs?
YES → L2 (counterfeit evidence)
NO → Unit tests excluded from evidence?
NO → L2 (only mocks, no real infra)
YES → Independent commissioner sign-off?
NO → L3 (Tested)
YES → L4 (Commissioned)The decision tree shows the logic. The build order shows the sequence.
What's the build order?
Five sprints. Each unblocks the next.
BUILD ORDER — Five Sprints
═══════════════════════════════════════════════════════════════
SPRINT 0: PARSE (3 days)
┌──────────────────┐ ┌──────────────────┐
│ FAVV parser │────→│ Index builder │
│ #1,#2: v2.1/v2.0 │ │ #3,#4: map+gaps │
└──────────────────┘ └────────┬─────────┘
│
SPRINT 1: COMPUTE (4 days) ▼
┌──────────────────┐ ┌──────────────────┐
│ Test runner │────→│ L-level computer │
│ #5: scoped vitest│ │ #6,#7: states │
└──────────────────┘ └────────┬─────────┘
│
SPRINT 2: WRITE (2 days) ▼
┌──────────────────┐ ┌──────────────────┐ ┌─────────────┐
│ Matrix writer │────→│ Receipt gen │────→│ CLI │
│ #8: update JSON │ │ #9: evidence │ │ #10: dry-run│
└──────────────────┘ └──────────────────┘ └──────┬──────┘
│
SPRINT 3: CI (1 day) ▼
┌──────────────────────────────────────────────────────────┐
│ Wire into merge pipeline — automatic on every merge │
└──────────────────────────────────────────────────────────┘
SPRINT 4: E2E (3 days)
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ E2e discovery│→ │ Playwright │→ │ L-level merge│→ │ Evidence │
│ #11 │ │ runner #12 │ │ #13 │ │ archive #14 │
└──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
+ Mock detector #15 + Unit exclusion #16
First target: Identity & Access features (AUTH-*)Questions
What happens when a feature's tests pass but the feature doesn't work?
- If the Success Test is weak (passes with empty arrays), is L3 a lie?
- Should the Safety Test column be the primary gate, not the Success Test?
- At what scale does running all mapped tests on every merge become too slow?