Automated Commissioning
What if the feature matrix updated itself from test results?
Scorecard
| Dimension | Score | Evidence |
|---|---|---|
| Pain | 4/5 | 210 features hand-edited. States drift. CRM claims L3 with no Story Contract. |
| Demand | 3/5 | Internal only. Every PRD review requires manual state checks. |
| Edge | 4/5 | FAVV Build Contract with Artifact + Safety Test columns — unique infrastructure for computed commissioning. |
| Trend | 4/5 | Agent-driven development needs deterministic commissioning. Manual breaks at scale. |
| Conversion | 2/5 | Internal tooling. No revenue. Operational value only. |
| Composite | 384 | 4 × 3 × 4 × 4 × 2 |
Kill signal: Script states contradict manual commissioner judgment on >20% of features after 3 runs.
Execution Substrate
Two test runners feed one L-level computation.
| Layer | Runner | Verifies | Artifacts |
|---|---|---|---|
| Logic | Vitest (scoped) | Unit tests, integration tests, data contracts | JSON results |
| Browser | Playwright (via Nx e2e target) | UI features, user flows, screen contracts | Traces, screenshots, video |
Playwright specs are deterministic executable knowledge — rerunnable, diffable, CI-friendly. Agent browsers are for exploration. Commissioning demands repeatability. Feature with passing unit tests but failing e2e spec is capped at L2.
Context
- Feature Matrix — The output this computes
- Commissioning Protocol — L0-L4 definitions
- Commissioning State Machine — Peer PRD for data table commissioning
- Prioritization Algorithm — Scoring that feeds build order
- RaaS Catalog — The 210 feature IDs
Questions
What breaks first when the script disagrees with a human commissioner?
- If the mapping is wrong, are computed states worse than manual states?
- Should unmapped features show a distinct state instead of staying at L0?
- When does this merge with the backburner Commissioning State Machine PRD?
- Should features without e2e specs be capped at L2, or can unit-only verification reach L3 for non-UI features?
- At what feature count does Playwright CI time force test sharding or parallel projects?