Skip to main content

← Automated Commissioning · Prompt Deck · Spec

Automated Commissioning — Pictures

One map. Instrument classification — the outcome map reveals what the spec doesn't already say.

  • Outcome Map — Binary success measures for each capability

Outcome Map

What does success look like? Every row is a binary test.

Instrument Outcome

OutcomeMeasurePass ConditionCurrent
Feature state is computed, not typedfeature-matrix.json updated by script, not humanGit blame shows script commit, not manual editFAIL — all states hand-edited
Same input produces same outputTwo runs on same test results produce identical JSONdiff of two runs = emptyFAIL — no script exists
Missing mappings are visibleFeatures with no test file mapping show unmappedCount of unmapped features in reportFAIL — unmapped features silently stay at L0
Safety violations block L3Feature with failing Safety Test cannot reach L3Script enforces: safety_fail → max(state, L2)FAIL — no safety check exists
Stale states detectedFeature whose tests now fail gets demotedState moves DOWN when tests regressFAIL — states only go up manually

Evidence Flow

PRD spec/index.md          →  FAVV parser extracts feature_id → test_file[]

Engineering repo tests → Vitest runs scoped to mapped files

JSON reporter output → L-level computer maps results to feature IDs

feature-matrix.json ← Writer updates state + updated columns

.ai/receipts/ ← Report generator logs evidence trail

L-Level Decision Tree

Has PRD with FAVV Build Contract?
NO → L0 (Spec only)
YES → Has schema/type in engineering repo?
NO → L0
YES → Has UI page/component using schema?
NO → L1 (Schema)
YES → All mapped tests pass?
NO → L2 (UI exists, tests fail)
YES → Any Safety Test violations?
YES → L2 (safety blocks L3)
NO → Independent commissioner sign-off?
NO → L3 (Tested)
YES → L4 (Commissioned)

Key Finding

The decision tree is deterministic at every node except L4. L4 requires a human commissioner — the builder and commissioner are never the same person. Everything below L4 is computable from test results.

Context

Questions

What happens when a feature's tests pass but the feature doesn't work?

  • If the Success Test is weak (passes with empty arrays), is L3 a lie?
  • Should the Safety Test column be the primary gate, not the Success Test?
  • At what scale does running all mapped tests on every merge become too slow?