Flow Diagrams

Four maps. The CONTROLLER verifies what the factory shipped. Feature state = f(test results).

What does success look like?

Five binary outcomes. Each is a pass/fail test.

OUTCOME MAP — Automated Commissioning
═══════════════════════════════════════════════════════════════

Outcome                         Measure                              Current
──────────────────────────────  ───────────────────────────────────  ───────
Feature state computed          Git blame shows script, not human    FAIL
Same input → same output        diff of two runs = empty             FAIL
Missing mappings visible        Unmapped features in report          FAIL
Safety violations block L3      safety_fail → max(state, L2)         FAIL
Stale states detected           State moves DOWN on regression       FAIL

Outcomes name what we're measuring. The evidence flow shows how data moves.

How does evidence flow?

PRD specs in, computed states out.

EVIDENCE FLOW — From PRD Spec to Feature Matrix
═══════════════════════════════════════════════════════════════

PRD spec/index.md          →  FAVV parser extracts feature_id → test_file[]
                               ↓
Engineering repo tests     →  Vitest runs scoped to mapped files
                               ↓
                           →  Playwright runs e2e specs (UI features)
                               ↓
JSON reporter output       →  L-level computer maps results to feature IDs
                               ↓
                           →  Mock detector flags counterfeit specs
                               ↓
feature-matrix.json        ←  Writer updates state + updated columns
                               ↓
.ai/receipts/              ←  Receipt generator logs evidence trail
                               ↓
Convex commissioning_results ← Sync to project dashboard

The evidence flow shows data movement. The decision tree shows the logic.

How are L-levels computed?

Deterministic at every node except L4. L4 requires a human commissioner.

L-LEVEL DECISION TREE
═══════════════════════════════════════════════════════════════

Has PRD with FAVV Build Contract?
  NO  → L0 (Spec only)
  YES → Has schema/type in engineering repo?
    NO  → L0
    YES → Has UI page/component using schema?
      NO  → L1 (Schema)
      YES → All mapped tests pass?
        NO  → L2 (UI exists, tests fail)
        YES → Any Safety Test violations?
          YES → L2 (safety blocks L3)
          NO  → Any mock routes in e2e specs?
            YES → L2 (counterfeit evidence)
            NO  → Unit tests excluded from evidence?
              NO  → L2 (only mocks, no real infra)
              YES → Independent commissioner sign-off?
                NO  → L3 (Tested)
                YES → L4 (Commissioned)

The decision tree shows the logic. The build order shows the sequence.

What's the build order?

Five sprints. Each unblocks the next.

BUILD ORDER — Five Sprints
═══════════════════════════════════════════════════════════════

SPRINT 0: PARSE (3 days)
┌──────────────────┐     ┌──────────────────┐
│ FAVV parser      │────→│ Index builder    │
│ #1,#2: v2.1/v2.0 │     │ #3,#4: map+gaps  │
└──────────────────┘     └────────┬─────────┘
                                  │
SPRINT 1: COMPUTE (4 days)        ▼
┌──────────────────┐     ┌──────────────────┐
│ Test runner      │────→│ L-level computer │
│ #5: scoped vitest│     │ #6,#7: states    │
└──────────────────┘     └────────┬─────────┘
                                  │
SPRINT 2: WRITE (2 days)          ▼
┌──────────────────┐     ┌──────────────────┐     ┌─────────────┐
│ Matrix writer    │────→│ Receipt gen      │────→│ CLI         │
│ #8: update JSON  │     │ #9: evidence     │     │ #10: dry-run│
└──────────────────┘     └──────────────────┘     └──────┬──────┘
                                                         │
SPRINT 3: CI (1 day)                                     ▼
┌──────────────────────────────────────────────────────────┐
│ Wire into merge pipeline — automatic on every merge      │
└──────────────────────────────────────────────────────────┘

SPRINT 4: E2E (3 days)
┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│ E2e discovery│→ │ Playwright   │→ │ L-level merge│→ │ Evidence     │
│ #11          │  │ runner #12   │  │ #13          │  │ archive #14  │
└──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘
       + Mock detector #15 + Unit exclusion #16

First target: Identity & Access features (AUTH-*)

Questions

What happens when a feature's tests pass but the feature doesn't work?

If the Success Test is weak (passes with empty arrays), is L3 a lie?
Should the Safety Test column be the primary gate, not the Success Test?
At what scale does running all mapped tests on every merge become too slow?