Automated Commissioning Spec
How do we make the feature matrix a measurement instrument instead of an opinion ledger?
Intent Contract
| Dimension | Statement |
|---|---|
| Objective | Feature matrix states must be computed from test results, not hand-edited — because manual L-levels drift, lie, and can't scale past 50 features without full-time curation |
| Outcomes | (1) feature-matrix.json states updated by script within 5 minutes of merge to main. (2) Every state change has an evidence trail in .ai/receipts/. (3) Unmapped features are visible, not hidden at L0. |
| Health Metrics | Existing feature states that are correct must not regress. Script must not mark features as L3 when tests are actually failing. |
| Constraints | Hard: script reads PRD specs from dream repo, runs tests in engineering repo — never writes to dream repo from engineering CI. Steering: prefer convention (file naming) over configuration (mapping files) for feature-to-test linking. |
| Autonomy | Allowed: test runner selection, output format, caching strategy. Escalate: changing L-level definitions, adding new states beyond L0-L4. Never: deleting test files, modifying PRD specs, changing feature IDs. |
| Stop Rules | Complete when: script runs in CI on merge to main, updates feature-matrix.json, produces evidence receipt. Halt when: script produces states that contradict manual commissioner judgment on >20% of features. |
| Counter-metrics | CI pipeline time must not increase by more than 3 minutes. False positives (features marked L3 that don't work) must be zero — better to under-report than over-report. |
| Blast Radius | feature-matrix.json (dream repo), .ai/receipts/ (dream repo), CI pipeline (engineering repo). No user-facing pages change behavior — the feature matrix page reads from the same JSON. |
| Rollback | Revert the JSON update commit. Previous states are in git history. Receipt files are append-only. |
Story Contract
Stories are test contracts. Each row advances to ≥1 test. Tests must be RED before implementation starts. GREEN = value delivered.
S1 — Auto-update on merge
Trigger: PR merged to main in engineering repo
Checklist:
-
feature-matrix.jsondiff shows state changes within 5 minutes of merge - Only features whose mapped tests were touched have state changes
- No feature reaches L3 when its test results are failing
Forbidden: States change for untouched features. Feature marked L3 with failing tests.
Evidence: integration — libs/commissioning/__tests__/
Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)
S2 — Unmapped features visible
Trigger: Commission script run completes
Checklist:
- Report shows
unmappedcount with specific feature IDs - No unmapped feature silently stays at L0 without a flag
- Script does not report 100% coverage when mappings are missing
Forbidden: Unmapped features hidden. Coverage report lies.
Evidence: unit — libs/commissioning/__tests__/index-builder.spec.ts
Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)
S3 — Safety Test blocks L3
Trigger: Safety Test assertion fails in test run
Checklist:
- Feature state capped at L2 regardless of Success Test results
- Safety Test failure logged with the feature ID
- Escalate to commissioner if >5 features blocked simultaneously
Forbidden: Feature reaches L3 with active safety violation. Failures silently ignored.
Evidence: integration — libs/commissioning/__tests__/level-computer.spec.ts
Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)
S4 — Regression triggers demotion
Trigger: Previously-passing test starts failing
Checklist:
- Feature state moves from L3 to L2 (or lower) in next commission run
- Demotion logged with previous state, new state, and failing test file
- No one-way ratchet — states can go down
Forbidden: State stays at L3 after tests break. Demotion applied silently without logging.
Evidence: integration — libs/commissioning/__tests__/level-computer.spec.ts
Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)
S5 — All three FAVV formats parsed
Trigger: PRD uses FFO, FAVV v2.0, or FAVV v2.1 format
Checklist:
- Parser extracts feature-to-test mappings from FFO (6 cols), v2.0, and v2.1 (9 cols)
- v2.0 and FFO PRDs are not silently skipped
- Partial results returned when table is malformed — no crash, log warning
Forbidden: Crash on FFO. Silent skip of v2.0. Only v2.1 produces mappings.
Evidence: unit — libs/commissioning/__tests__/favv-parser.spec.ts
Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)
S6 — UI features verified via Playwright
Trigger: Commission script encounters feature with e2e spec
Checklist:
- Playwright runs spec headlessly and captures trace + screenshot
- Result feeds L-level computation alongside unit results
- Feature with passing unit tests but failing e2e is capped at L2
- Escalate to commissioner if Playwright CI adds >2 minutes per feature
Forbidden: Feature reaches L3 from unit tests alone when e2e spec exists but fails. Playwright runs headed in CI. Agent browser used instead of spec.
Evidence: e2e — libs/commissioning/__tests__/e2e-runner.spec.ts
Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)
S7 — Mock routes flagged as counterfeit
Trigger: Commission script analyzes e2e spec files before running
Checklist:
- AST scan flags
page.route()andpage.fulfill()usage - Report lists counterfeit specs with file path and line number
- Scanner does not modify spec files
- No false positives on legitimate test setup (auth state, seed data)
Forbidden: Mock-route specs counted as passing evidence. Fixture data accepted as integration proof.
Evidence: static-analysis — libs/commissioning/__tests__/mock-detector.spec.ts
Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)
S8 — Unit tests excluded from L3 evidence
Trigger: Commission script categorizes test results
Checklist:
- Unit tests (mocked DB/server) logged but not used for L-level state changes
- Only integration (real DB) and e2e (real server) results feed computation
- Unit test exclusion does not affect unit test reporting elsewhere
Forbidden: Unit test with mocked DB counted as integration evidence. Hook test with stubbed API affects L-level.
Evidence: unit — libs/commissioning/__tests__/level-computer.spec.ts
Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)
S9 — Commissioning level appears in ProjectWithStats
Trigger: Commission script completes a run against a project's features
Checklist:
-
commissioning_resultsConvex table updated with{ featureId, prdRef, level, updatedAt }per feature -
listProjectsWithStatsreturnscommissioningLevel: number | nullfor each project (highest L-level across its features) - Project with no commissioning run returns
commissioningLevel: null— not 0 - Commission script does not write to Convex when
--dry-runflag is active
Forbidden: Dashboard shows commissioningLevel: 0 when no commission run has occurred. Script writes to Convex in dry-run mode.
Evidence: integration — libs/commissioning/__tests__/convex-sync.spec.ts
Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)
Build Contract
Job 1: Parse PRD Specs into Feature-Test Index
| # | Function | Artifact | Success Test | Safety Test | Regression Test | Value | State |
|---|---|---|---|---|---|---|---|
| 1 | Parse FAVV v2.1 Build Contract tables from PRD specs | libs/commissioning/parsers/favv-parser.ts | Given prd-identity-access/spec/index.md, extracts 8+ FAVV rows with feature IDs and artifact paths | Parser invents mappings for features not in the Build Contract. Parser crashes on malformed tables instead of logging warning. | Existing PRD parsing in project-from-prd must not break | Feature-to-test index exists | Gap |
| 2 | Parse FAVV v2.0 and FFO format Build Contracts | libs/commissioning/parsers/favv-parser.ts | Given a v2.0 table (6 columns) or FFO table, extracts feature-function-artifact mappings | v2.0/FFO PRDs silently skipped. Parser returns empty array instead of partial results. | — | All PRD formats produce mappings | Gap |
| 3 | Build feature-to-test-file index from parsed artifacts | libs/commissioning/index-builder.ts | Given parsed FAVV rows, produces Map<FeatureID, TestFile[]> with deduplication | Index includes test files that don't exist on disk. Index silently drops features with no artifact column. | — | Single source of truth for what tests verify what features | Gap |
| 4 | Detect and report unmapped features | libs/commissioning/index-builder.ts | Given feature-matrix.json (210 IDs) and parsed index, reports features with zero test file mappings | Unmapped features hidden. Report says "100% mapped" when mappings are missing. | — | Visibility into verification gaps | Gap |
Job 2: Run Scoped Tests and Compute L-Levels
| # | Function | Artifact | Success Test | Safety Test | Regression Test | Value | State |
|---|---|---|---|---|---|---|---|
| 5 | Run vitest scoped to indexed test files only | libs/commissioning/test-runner.ts | Given Map<FeatureID, TestFile[]>, runs vitest with JSON reporter on only those files. Exit code + results captured. | Script runs ALL tests instead of scoped set. Script modifies test files. Script runs with production credentials. | CI pipeline time increase ≤3 minutes over baseline | Only relevant tests run — fast feedback | Gap |
| 6 | Compute L-level per feature from test results | libs/commissioning/level-computer.ts | Given test results JSON and schema/UI existence checks, computes L0-L4 per feature using decision tree | Feature reaches L3 with failing Safety Test. Feature at L3 when no tests mapped (should be L0). L-level goes UP when evidence goes DOWN. | — | States are computed, not opined | Gap |
| 7 | Detect regressions — features that should demote | libs/commissioning/level-computer.ts | Given previous feature-matrix.json states and new computed states, flags demotions with evidence | Demotions silently applied without logging. Script refuses to demote and keeps stale state. | — | Trust in the matrix — states reflect reality | Gap |
Job 3: Write Results and Produce Evidence
| # | Function | Artifact | Success Test | Safety Test | Regression Test | Value | State |
|---|---|---|---|---|---|---|---|
| 8 | Update feature-matrix.json with computed states | libs/commissioning/matrix-writer.ts | Given computed L-levels, updates state and updated fields in JSON. Diff is minimal (only changed rows). | Writer corrupts JSON structure. Writer changes fields other than state and updated. Writer removes features from the array. | feature-matrix.jsx renders correctly after update — no broken categories, no NaN counts | Feature matrix is a computed output | Gap |
| 9 | Generate commission receipt with evidence trail | libs/commissioning/receipt-generator.ts | Produces .ai/receipts/YYYY-MM-DD-commission-run.json with per-feature evidence (previous state, new state, test files, pass/fail) | Receipt omits failures. Receipt generated even when script errors. Receipt overwrites previous receipt instead of appending. | — | Audit trail for every state change | Gap |
| 10 | CLI interface with dry-run, single-feature, and full-matrix modes | tools/scripts/commission/commission-features.ts | --dry-run shows changes without writing. --feature=AUTH-001 commissions single feature. --all runs full matrix. | --dry-run actually writes changes. --feature runs all tests instead of scoped set. Script with no flags runs full matrix without confirmation. | — | Ergonomic for both CI and manual use | Gap |
Job 4: E2E Verification via Playwright
| # | Function | Artifact | Success Test | Safety Test | Regression Test | Value | State |
|---|---|---|---|---|---|---|---|
| 11 | Discover Playwright e2e specs from feature-test index | libs/commissioning/e2e-discovery.ts | Given Map<FeatureID, TestFile[]>, partitions into vitest unit specs and Playwright e2e specs by file path pattern (*.spec.ts in e2e project vs __tests__/) | E2e specs routed to vitest runner. Unit specs routed to Playwright. Discovery misses specs outside conventional paths. | Existing vitest scoping (#5) unaffected — e2e discovery is additive | Features with browser behavior get browser verification | Gap |
| 12 | Run Playwright specs scoped to feature via Nx target | libs/commissioning/e2e-runner.ts | Given feature's e2e spec paths, runs nx e2e <project> --grep <spec> headlessly. Captures exit code, trace, screenshot. JSON results extracted. | Playwright runs in headed mode in CI. Runner executes specs outside the scoped set. Runner uses playwright test directly instead of Nx target (bypasses project config). | Existing Nx e2e pipeline unaffected — commission runner uses same target, different filter | UI features verified by real browser, not just unit assertions | Gap |
| 13 | Merge e2e results into L-level computation | libs/commissioning/level-computer.ts | L-level computer treats e2e pass/fail as additional evidence alongside vitest results. Feature with passing unit tests but failing e2e is capped at L2. | Feature reaches L3 when e2e spec exists but wasn't run. E2e failure silently ignored when unit tests pass. Unit-only features penalized for missing e2e specs. | Existing L-level computation for unit-only features unchanged | L-levels reflect real user-facing behavior, not just logic correctness | Gap |
| 14 | Archive Playwright traces and screenshots as commission evidence | libs/commissioning/e2e-evidence.ts | Playwright traces saved to dist/.playwright/traces/commission/<feature-id>/. Screenshots saved alongside. Receipt (#9) includes e2e artifact paths. | Traces saved with PII from test fixtures. Traces accumulate without rotation. Evidence paths in receipt point to non-existent files. | — | Visual proof of feature state — commissioner can replay the trace | Gap |
| 15 | Detect mock routes in e2e specs (counterfeit test scanner) | libs/commissioning/mock-detector.ts | Given e2e spec files, AST-scan for page.route(), page.fulfill(), fixture injection patterns. Flag specs that intercept real API calls with canned responses. Report lists counterfeit specs with line numbers. | Mock-route specs pass the scanner. Scanner false-positives on legitimate test setup (auth state, seed data). Scanner modifies spec files. | — | Counterfeit tests made visible before they produce false L3 claims | Gap |
| 16 | Exclude unit tests from L-level computation | libs/commissioning/level-computer.ts | Test categorizer identifies unit tests (mocked dependencies, no real DB/server) and excludes them from L-level input. Only integration (real DB) and e2e (real server) results feed state computation. | Unit test with mocked DB counted as integration evidence. Hook test with stubbed API affects L-level. | Existing unit test reporting unaffected — tests still run, results just don't feed commissioning | L3 means real infrastructure was tested, not mocks | Gap |
Job 5: Sync Results to Project Dashboard (N5 Bridge)
| # | Function | Artifact | Success Test | Safety Test | Regression Test | Value | State |
|---|---|---|---|---|---|---|---|
| 17 | Call Convex mutation to sync L-levels after commission run | libs/commissioning/convex-sync.ts | After --all run, commissioning_results table in Convex has one row per feature with featureId, prdRef, level, updatedAt | Sync fires in --dry-run mode. Sync overwrites results from a later run with earlier data. | listProjectsWithStats query performance unchanged (<200ms) | Dashboard reads from one source (Convex) — not dream repo filesystem | Gap |
| 18 | Extend listProjectsWithStats to include commissioningLevel | convex/projects.ts | Query returns commissioningLevel: number | null — null when no commission run, highest L-level when run exists | Projects with no commission run return commissioningLevel: 0 instead of null. Query crashes when commissioning_results table is empty. | Existing plan/task stats in ProjectWithStats unchanged | Project dashboard shows commissioning health alongside plan progress | Gap |
Convex schema addition:
// commissioning_results table
defineTable({
featureId: v.string(), // e.g. "AUTH-001"
prdRef: v.string(), // e.g. "prd-identity-access"
level: v.number(), // 0-4
updatedAt: v.number(), // Unix timestamp (ms) — use Date.now()
})
Principles
The Job
| Element | Detail |
|---|---|
| Situation | After engineering merges code, someone must manually check which features advanced and edit feature-matrix.json by hand. With 210 features, this takes 30+ minutes and produces unreliable states. |
| Intention | Feature states are a function of test results. The matrix is a measurement instrument. Run the script, get the truth. |
| Obstacle | No bridge connects PRD Build Contract artifacts to test execution. Test files don't declare which feature they verify. Three FAVV formats coexist. |
| Hardest Thing | The mapping fidelity. A weak mapping (wrong test files → wrong features) produces states that look computed but are just as wrong as manual edits — except now with false confidence. |
Why Now
- 210 features, 0 at L4. Manual tracking has hit its ceiling.
- Agent-driven development means agents need deterministic commissioning — they can't make judgment calls about L-levels.
- The FAVV v2.1 format now includes Artifact + Success Test + Safety Test columns — the mapping data finally exists in the PRD specs.
- Every new PRD adds more features to track manually.
Design Constraints
| Constraint | Rationale |
|---|---|
| Script lives in engineering repo | It runs tests — tests live in engineering. Dream repo is read-only from engineering CI. |
| Pure function: same input → same output | Commissioning must be trustworthy. Non-determinism means nobody trusts the matrix. |
| Under-report over over-report | A feature falsely at L3 is worse than a feature stuck at L0. Conservative state computation. |
| Convention over configuration | @feature AUTH-001 docblock tag or auth-001.spec.ts naming beats a separate mapping file that drifts. |
| Playwright for e2e, vitest for unit | Two test runners, one L-level computation. Playwright provides deterministic, reproducible browser verification with traces, screenshots, and CI-friendly artifacts. Vitest handles logic. Both feed the same level-computer. |
| Specs over agent browsers | Playwright specs are durable executable knowledge — rerunnable, diffable, CI-split. Agent browsers are for exploration, not commissioning. Commissioning demands repeatability. |
| Reuse Nx e2e targets | Nx Playwright plugin infers e2e tasks from playwright.config.ts. Commission script invokes existing nx e2e targets, not a parallel browser workflow. |
Performance
Priority Score
| Dimension | Score (1-5) | Evidence |
|---|---|---|
| Pain | 4 | 210 features manually tracked. States drift. CRM claims L3 with no Story Contract to verify. Every state update is a judgment call. |
| Demand | 3 | Internal demand. Every PRD review requires manual state checks. Backburner commissioning PRD exists because pain was identified. No external demand. |
| Edge | 4 | FAVV Build Contract with Artifact + Safety Test columns is unique infrastructure. Feature-ID-tagged test contracts feeding a state machine — no competitor has this. |
| Trend | 4 | Agent-driven development requires deterministic commissioning. Manual tracking breaks at scale. More PRDs = more features = more manual work. |
| Conversion | 2 | Internal tooling. No revenue path. Operational value: trustworthy feature matrix, faster commissioning. |
| Composite | 384 | 4 × 3 × 4 × 4 × 2 |
Quality Targets
| Target | Threshold |
|---|---|
| State computation accuracy | Zero false L3 (feature marked tested when tests fail) |
| Unmapped visibility | 100% of unmapped features flagged in report |
| CI time overhead | ≤3 minutes added to merge pipeline |
| Demotion detection | 100% of regressions caught in next run |
Failure Budget
| Failure Type | Budget | Response |
|---|---|---|
| False L3 (tests fail but state says L3) | 0 | Kill the script, revert to manual until fixed |
| Missed demotion (tests regress, state stays) | 0 | Same — trust is binary |
| Wrong mapping (test assigned to wrong feature) | ≤5% | Log warning, mark feature as uncertain in report |
Kill signal: Script produces states that contradict manual commissioner judgment on >20% of features after 3 runs. If the algorithm disagrees with reality that often, the mapping is wrong — fix the mapping before running again.
Platform
Current State
| Component | Built | Wired | Working |
|---|---|---|---|
| PRD specs with FAVV Build Contracts | Yes (3 PRDs have full Story+Build) | No — not parsed by any commissioning script | Partial — data exists but isn't consumed |
| feature-matrix.json | Yes | Yes — powers feature-matrix.jsx page | Yes — but hand-edited |
| Vitest in engineering repo | Yes | Yes — runs in CI | Yes — but not scoped to feature IDs |
| Agent receipt schema | Yes (v1.0 defined) | No — no commissioning script emits receipts | No |
| FAVV parser in project-from-prd | Yes | Yes — reads Build Contracts for engineering tasks | Partial — doesn't extract feature-to-test mapping |
| Playwright in Nx monorepo | Yes — Nx plugin installed, playwright.config.ts present | Yes — nx e2e targets inferred, CI runs e2e specs | Yes — but not scoped to feature IDs or connected to commissioning |
Build Ratio
~50% composition (extend existing FAVV parser, use existing vitest, reuse Nx Playwright targets, write to existing JSON), ~50% new code (index builder, level computer, matrix writer, receipt generator, e2e discovery/runner/evidence).
Protocols
Build Order
| Sprint | Features | What | Effort | Acceptance |
|---|---|---|---|---|
| 0 | #1, #2, #3, #4 | FAVV parser + index builder + unmapped report | 3 days | Parser extracts mappings from 3 PRDs. Unmapped count matches manual count. |
| 1 | #5, #6, #7 | Scoped test runner + L-level computer + regression detection | 4 days | Script computes L-levels for Identity & Access features. Results match manual assessment. |
| 2 | #8, #9, #10 | Matrix writer + receipt generator + CLI | 2 days | --dry-run shows correct diff. --all updates JSON. Receipt saved. |
| 3 | CI integration | Wire into merge pipeline | 1 day | Merge to main triggers commission run. JSON updated automatically. |
| 4 | #11, #12, #13, #14 | Playwright e2e discovery + scoped runner + L-level merge + evidence archival | 3 days | Features with e2e specs get browser verification. Traces archived. L-levels reflect e2e results. |
Commissioning
| # | Feature | Install | Test | Operational | Optimize |
|---|---|---|---|---|---|
| 1 | FAVV parser | — | — | — | — |
| 2 | Format compatibility (FFO/v2.0/v2.1) | — | — | — | — |
| 3 | Index builder | — | — | — | — |
| 4 | Unmapped feature detection | — | — | — | — |
| 5 | Scoped test runner | — | — | — | — |
| 6 | L-level computer | — | — | — | — |
| 7 | Regression detection | — | — | — | — |
| 8 | Matrix writer | — | — | — | — |
| 9 | Receipt generator | — | — | — | — |
| 10 | CLI interface | — | — | — | — |
| 11 | E2e spec discovery | — | — | — | — |
| 12 | Playwright scoped runner | — | — | — | — |
| 13 | E2e + unit L-level merge | — | — | — | — |
| 14 | Trace/screenshot archival | — | — | — | — |
| 15 | Mock route detection (counterfeit test scanner) | — | — | — | — |
| 16 | Unit test exclusion from L3 evidence | — | — | — | — |
Agent-Facing Spec
Commands:
# From engineering repo root — unit + integration commissioning
npx tsx tools/scripts/commission/commission-features.ts --dry-run # preview
npx tsx tools/scripts/commission/commission-features.ts --feature=AUTH-001 # single
npx tsx tools/scripts/commission/commission-features.ts --all # full matrix
npx tsx tools/scripts/commission/commission-features.ts --report # unmapped report only
# E2e commissioning via Playwright (scoped to feature)
npx tsx tools/scripts/commission/commission-features.ts --feature=AUTH-001 --e2e # includes Playwright
pnpm nx e2e web-e2e --grep="AUTH-001" # direct Nx target
pnpm exec playwright test --project=chromium --grep="AUTH-001" # direct Playwright CLI
# Debugging and investigation
pnpm exec playwright codegen http://localhost:3000 # generate selectors
pnpm exec playwright test apps/web-e2e/src/auth.spec.ts --debug # step-through debug
pnpm nx e2e web-e2e --ui # interactive UI mode
Boundaries:
- Always: read PRD specs, run mapped tests, write JSON, emit receipt
- Ask first: demoting a feature that was manually set to L3+
- Never: modify test files, change PRD specs, delete features from JSON
Test Contract:
| # | Feature | Test File | Assertion |
|---|---|---|---|
| 1 | FAVV parser | libs/commissioning/__tests__/favv-parser.spec.ts | Extracts correct rows from v2.1, v2.0, FFO formats |
| 2 | Index builder | libs/commissioning/__tests__/index-builder.spec.ts | Produces correct Map, flags unmapped features |
| 3 | L-level computer | libs/commissioning/__tests__/level-computer.spec.ts | Decision tree matches expected L-levels for test fixtures |
| 4 | Matrix writer | libs/commissioning/__tests__/matrix-writer.spec.ts | JSON structure preserved, only state+updated change |
| 5 | CLI | libs/commissioning/__tests__/cli.spec.ts | --dry-run doesn't write, --feature scopes correctly |
| 6 | E2e discovery | libs/commissioning/__tests__/e2e-discovery.spec.ts | Correctly partitions vitest vs Playwright specs by path convention |
| 7 | E2e runner | libs/commissioning/__tests__/e2e-runner.spec.ts | Invokes Nx target with correct grep, captures trace artifacts |
| 8 | E2e + unit merge | libs/commissioning/__tests__/level-computer-e2e.spec.ts | Feature with passing unit but failing e2e capped at L2 |
Players
Demand-Side Jobs
Job 1: Commissioner Verifies Feature States
Situation: After engineering ships a sprint, the commissioner needs to know which features actually advanced.
| Element | Detail |
|---|---|
| Struggling moment | Opens feature-matrix.json, sees states from last manual edit. No idea if they're still true. Has to open the app, check each feature, type the new level. |
| Current workaround | Manual inspection + hand-editing JSON. Takes 30+ minutes for a full pass. Usually skipped — states go stale. |
| What progress looks like | Run one command, get a report. States reflect test results. Regressions flagged. |
| Hidden objection | "What if the script is wrong and I trusted it?" — the false L3 fear. |
| Switch trigger | Feature count exceeds 200 (already happened). Manual tracking visibly wrong on a commissioned feature. |
Features that serve this job: #5, #6, #7, #8, #9, #10
Job 2: Engineering Agent Updates Matrix After Merge
Situation: CI pipeline merges code. Feature states should reflect the new reality.
| Element | Detail |
|---|---|
| Struggling moment | Code merged, tests pass, but feature matrix still shows old states. Someone has to remember to update it. |
| Current workaround | Post-merge manual edit, often forgotten. States lag reality by days or weeks. |
| What progress looks like | CI runs commission script automatically. Matrix updated within 5 minutes of merge. |
| Hidden objection | "What if CI takes too long?" — the pipeline slowdown fear. |
| Switch trigger | Feature state was wrong during a demo or review. |
Features that serve this job: #5, #8, #10
Role Definitions
| Role | Access | Permissions |
|---|---|---|
| Commission script (CI) | Read: PRD specs, test files, feature-matrix.json. Write: feature-matrix.json, receipts. | Run tests, compute states, write results. |
| Commissioner (human) | Read: commission report, receipts. Write: L4 sign-off. | Override script states with evidence. Approve L3→L4 transitions. |
| PRD author | Read: unmapped report. Write: PRD spec Artifact columns. | Fix missing mappings by adding Artifact paths to Build Contract. |
Relationship to Other PRDs
| PRD | Relationship | Data Flow |
|---|---|---|
| Project Management System | Downstream — N3 project dashboard consumes commissioning results | Script writes L-levels to commissioning_results Convex table. listProjectsWithStats joins to return commissioningLevel per project. N5 is the bridge build. |
| Commissioning State Machine | Peer — they commission data tables (195), we commission RaaS features (210) | Shared L-level vocabulary. Different subjects, same progression model. Could merge later. |
| Agent Platform | Upstream — provides FAVV Build Contract with test artifacts | Script parses Agent Platform's spec to build feature-to-test index |
| Identity & Access | Upstream — provides Story Contract + Build Contract | Script parses Identity's spec as test case for parser |
| Sales CRM & RFP | Upstream — BLOCKED: no Story Contract, FFO format only | Script must handle FFO format. CRM features will show as partially mapped until Story Contract added. |
| CLI Platform | Peer — commission script could become a drmg commission subcommand | Script initially standalone. Absorb into CLI Platform when that PRD ships. |
Context
- PRD Index — Automated Commissioning
- Prompt Deck — 5-card pitch
- Pictures — Pre-flight maps
Questions
What breaks first when the script disagrees with a human commissioner?
- If 30% of features are unmapped, is the script's output trustworthy enough to replace manual edits?
- Should the script refuse to write results when unmapped percentage exceeds a threshold?
- When CRM gets a Story Contract, how many features jump from L0 to L3 in one run — and does that shock look like a bug?
- Is the Safety Test gate (blocking L3) too conservative — or not conservative enough?