Skip to main content

Automated Commissioning

What if the feature matrix updated itself from test results?

Scorecard

DimensionScoreEvidence
Pain4/5210 features hand-edited. States drift. CRM claims L3 with no Story Contract.
Demand3/5Internal only. Every PRD review requires manual state checks.
Edge4/5FAVV Build Contract with Artifact + Safety Test columns — unique infrastructure for computed commissioning.
Trend4/5Agent-driven development needs deterministic commissioning. Manual breaks at scale.
Conversion2/5Internal tooling. No revenue. Operational value only.
Composite3844 × 3 × 4 × 4 × 2

Kill signal: Script states contradict manual commissioner judgment on >20% of features after 3 runs.

Execution Substrate

Two test runners feed one L-level computation.

LayerRunnerVerifiesArtifacts
LogicVitest (scoped)Unit tests, integration tests, data contractsJSON results
BrowserPlaywright (via Nx e2e target)UI features, user flows, screen contractsTraces, screenshots, video

Playwright specs are deterministic executable knowledge — rerunnable, diffable, CI-friendly. Agent browsers are for exploration. Commissioning demands repeatability. Feature with passing unit tests but failing e2e spec is capped at L2.

Context

Questions

What breaks first when the script disagrees with a human commissioner?

  • If the mapping is wrong, are computed states worse than manual states?
  • Should unmapped features show a distinct state instead of staying at L0?
  • When does this merge with the backburner Commissioning State Machine PRD?
  • Should features without e2e specs be capped at L2, or can unit-only verification reach L3 for non-UI features?
  • At what feature count does Playwright CI time force test sharding or parallel projects?