Automated Commissioning — Pictures
One map. Instrument classification — the outcome map reveals what the spec doesn't already say.
- Outcome Map — Binary success measures for each capability
Outcome Map
What does success look like? Every row is a binary test.
Instrument Outcome
| Outcome | Measure | Pass Condition | Current |
|---|---|---|---|
| Feature state is computed, not typed | feature-matrix.json updated by script, not human | Git blame shows script commit, not manual edit | FAIL — all states hand-edited |
| Same input produces same output | Two runs on same test results produce identical JSON | diff of two runs = empty | FAIL — no script exists |
| Missing mappings are visible | Features with no test file mapping show unmapped | Count of unmapped features in report | FAIL — unmapped features silently stay at L0 |
| Safety violations block L3 | Feature with failing Safety Test cannot reach L3 | Script enforces: safety_fail → max(state, L2) | FAIL — no safety check exists |
| Stale states detected | Feature whose tests now fail gets demoted | State moves DOWN when tests regress | FAIL — states only go up manually |
Evidence Flow
PRD spec/index.md → FAVV parser extracts feature_id → test_file[]
↓
Engineering repo tests → Vitest runs scoped to mapped files
↓
JSON reporter output → L-level computer maps results to feature IDs
↓
feature-matrix.json ← Writer updates state + updated columns
↓
.ai/receipts/ ← Report generator logs evidence trail
L-Level Decision Tree
Has PRD with FAVV Build Contract?
NO → L0 (Spec only)
YES → Has schema/type in engineering repo?
NO → L0
YES → Has UI page/component using schema?
NO → L1 (Schema)
YES → All mapped tests pass?
NO → L2 (UI exists, tests fail)
YES → Any Safety Test violations?
YES → L2 (safety blocks L3)
NO → Independent commissioner sign-off?
NO → L3 (Tested)
YES → L4 (Commissioned)
Key Finding
The decision tree is deterministic at every node except L4. L4 requires a human commissioner — the builder and commissioner are never the same person. Everything below L4 is computable from test results.
Context
- PRD Index — Automated Commissioning
- Prompt Deck — 5-card pitch
- Spec — Engineering depth
Questions
What happens when a feature's tests pass but the feature doesn't work?
- If the Success Test is weak (passes with empty arrays), is L3 a lie?
- Should the Safety Test column be the primary gate, not the Success Test?
- At what scale does running all mapped tests on every merge become too slow?