← Automated Commissioning · Prompt Deck · Pictures

Automated Commissioning Spec

How do we make the feature matrix a measurement instrument instead of an opinion ledger?

Intent Contract

Dimension	Statement
Objective	Feature matrix states must be computed from test results, not hand-edited — because manual L-levels drift, lie, and can't scale past 50 features without full-time curation
Outcomes	(1) `feature-matrix.json` states updated by script within 5 minutes of merge to main. (2) Every state change has an evidence trail in `.ai/receipts/`. (3) Unmapped features are visible, not hidden at L0.
Health Metrics	Existing feature states that are correct must not regress. Script must not mark features as L3 when tests are actually failing.
Constraints	Hard: script reads PRD specs from dream repo, runs tests in engineering repo — never writes to dream repo from engineering CI. Steering: prefer convention (file naming) over configuration (mapping files) for feature-to-test linking.
Autonomy	Allowed: test runner selection, output format, caching strategy. Escalate: changing L-level definitions, adding new states beyond L0-L4. Never: deleting test files, modifying PRD specs, changing feature IDs.
Stop Rules	Complete when: script runs in CI on merge to main, updates `feature-matrix.json`, produces evidence receipt. Halt when: script produces states that contradict manual commissioner judgment on >20% of features.
Counter-metrics	CI pipeline time must not increase by more than 3 minutes. False positives (features marked L3 that don't work) must be zero — better to under-report than over-report.
Blast Radius	`feature-matrix.json` (dream repo), `.ai/receipts/` (dream repo), CI pipeline (engineering repo). No user-facing pages change behavior — the feature matrix page reads from the same JSON.
Rollback	Revert the JSON update commit. Previous states are in git history. Receipt files are append-only.

Story Contract

Stories are test contracts. Each row advances to ≥1 test. Tests must be RED before implementation starts. GREEN = value delivered.

S1 — Auto-update on merge

Trigger: PR merged to main in engineering repo

Checklist:

feature-matrix.json diff shows state changes within 5 minutes of merge
Only features whose mapped tests were touched have state changes
No feature reaches L3 when its test results are failing

Forbidden: States change for untouched features. Feature marked L3 with failing tests.

Evidence: integration — libs/commissioning/__tests__/

Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)

S2 — Unmapped features visible

Trigger: Commission script run completes

Checklist:

Report shows unmapped count with specific feature IDs
No unmapped feature silently stays at L0 without a flag
Script does not report 100% coverage when mappings are missing

Forbidden: Unmapped features hidden. Coverage report lies.

Evidence: unit — libs/commissioning/__tests__/index-builder.spec.ts

Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)

S3 — Safety Test blocks L3

Trigger: Safety Test assertion fails in test run

Checklist:

Feature state capped at L2 regardless of Success Test results
Safety Test failure logged with the feature ID
Escalate to commissioner if >5 features blocked simultaneously

Forbidden: Feature reaches L3 with active safety violation. Failures silently ignored.

Evidence: integration — libs/commissioning/__tests__/level-computer.spec.ts

Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)

S4 — Regression triggers demotion

Trigger: Previously-passing test starts failing

Checklist:

Feature state moves from L3 to L2 (or lower) in next commission run
Demotion logged with previous state, new state, and failing test file
No one-way ratchet — states can go down

Forbidden: State stays at L3 after tests break. Demotion applied silently without logging.

Evidence: integration — libs/commissioning/__tests__/level-computer.spec.ts

Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)

S5 — All three FAVV formats parsed

Trigger: PRD uses FFO, FAVV v2.0, or FAVV v2.1 format

Checklist:

Parser extracts feature-to-test mappings from FFO (6 cols), v2.0, and v2.1 (9 cols)
v2.0 and FFO PRDs are not silently skipped
Partial results returned when table is malformed — no crash, log warning

Forbidden: Crash on FFO. Silent skip of v2.0. Only v2.1 produces mappings.

Evidence: unit — libs/commissioning/__tests__/favv-parser.spec.ts

Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)

S6 — UI features verified via Playwright

Trigger: Commission script encounters feature with e2e spec

Checklist:

Playwright runs spec headlessly and captures trace + screenshot
Result feeds L-level computation alongside unit results
Feature with passing unit tests but failing e2e is capped at L2
Escalate to commissioner if Playwright CI adds >2 minutes per feature

Forbidden: Feature reaches L3 from unit tests alone when e2e spec exists but fails. Playwright runs headed in CI. Agent browser used instead of spec.

Evidence: e2e — libs/commissioning/__tests__/e2e-runner.spec.ts

Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)

S7 — Mock routes flagged as counterfeit

Trigger: Commission script analyzes e2e spec files before running

Checklist:

AST scan flags page.route() and page.fulfill() usage
Report lists counterfeit specs with file path and line number
Scanner does not modify spec files
No false positives on legitimate test setup (auth state, seed data)

Forbidden: Mock-route specs counted as passing evidence. Fixture data accepted as integration proof.

Evidence: static-analysis — libs/commissioning/__tests__/mock-detector.spec.ts

Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)

S8 — Unit tests excluded from L3 evidence

Trigger: Commission script categorizes test results

Checklist:

Unit tests (mocked DB/server) logged but not used for L-level state changes
Only integration (real DB) and e2e (real server) results feed computation
Unit test exclusion does not affect unit test reporting elsewhere

Forbidden: Unit test with mocked DB counted as integration evidence. Hook test with stubbed API affects L-level.

Evidence: unit — libs/commissioning/__tests__/level-computer.spec.ts

Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)

S9 — Commissioning level appears in ProjectWithStats

Trigger: Commission script completes a run against a project's features

Checklist:

commissioning_results Convex table updated with { featureId, prdRef, level, updatedAt } per feature
listProjectsWithStats returns commissioningLevel: number | null for each project (highest L-level across its features)
Project with no commissioning run returns commissioningLevel: null — not 0
Commission script does not write to Convex when --dry-run flag is active

Forbidden: Dashboard shows commissioningLevel: 0 when no commission run has occurred. Script writes to Convex in dry-run mode.

Evidence: integration — libs/commissioning/__tests__/convex-sync.spec.ts

Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)

Build Contract

Job 1: Parse PRD Specs into Feature-Test Index

#	Function	Artifact	Success Test	Safety Test	Regression Test	Value	State
1	Parse FAVV v2.1 Build Contract tables from PRD specs	`libs/commissioning/parsers/favv-parser.ts`	Given `prd-identity-access/spec/index.md`, extracts 8+ FAVV rows with feature IDs and artifact paths	Parser invents mappings for features not in the Build Contract. Parser crashes on malformed tables instead of logging warning.	Existing PRD parsing in `project-from-prd` must not break	Feature-to-test index exists	Gap
2	Parse FAVV v2.0 and FFO format Build Contracts	`libs/commissioning/parsers/favv-parser.ts`	Given a v2.0 table (6 columns) or FFO table, extracts feature-function-artifact mappings	v2.0/FFO PRDs silently skipped. Parser returns empty array instead of partial results.	—	All PRD formats produce mappings	Gap
3	Build feature-to-test-file index from parsed artifacts	`libs/commissioning/index-builder.ts`	Given parsed FAVV rows, produces `Map<FeatureID, TestFile[]>` with deduplication	Index includes test files that don't exist on disk. Index silently drops features with no artifact column.	—	Single source of truth for what tests verify what features	Gap
4	Detect and report unmapped features	`libs/commissioning/index-builder.ts`	Given feature-matrix.json (210 IDs) and parsed index, reports features with zero test file mappings	Unmapped features hidden. Report says "100% mapped" when mappings are missing.	—	Visibility into verification gaps	Gap

Job 2: Run Scoped Tests and Compute L-Levels

#	Function	Artifact	Success Test	Safety Test	Regression Test	Value	State
5	Run vitest scoped to indexed test files only	`libs/commissioning/test-runner.ts`	Given `Map<FeatureID, TestFile[]>`, runs vitest with JSON reporter on only those files. Exit code + results captured.	Script runs ALL tests instead of scoped set. Script modifies test files. Script runs with production credentials.	CI pipeline time increase ≤3 minutes over baseline	Only relevant tests run — fast feedback	Gap
6	Compute L-level per feature from test results	`libs/commissioning/level-computer.ts`	Given test results JSON and schema/UI existence checks, computes L0-L4 per feature using decision tree	Feature reaches L3 with failing Safety Test. Feature at L3 when no tests mapped (should be L0). L-level goes UP when evidence goes DOWN.	—	States are computed, not opined	Gap
7	Detect regressions — features that should demote	`libs/commissioning/level-computer.ts`	Given previous `feature-matrix.json` states and new computed states, flags demotions with evidence	Demotions silently applied without logging. Script refuses to demote and keeps stale state.	—	Trust in the matrix — states reflect reality	Gap

Job 3: Write Results and Produce Evidence

#	Function	Artifact	Success Test	Safety Test	Regression Test	Value	State
8	Update feature-matrix.json with computed states	`libs/commissioning/matrix-writer.ts`	Given computed L-levels, updates `state` and `updated` fields in JSON. Diff is minimal (only changed rows).	Writer corrupts JSON structure. Writer changes fields other than `state` and `updated`. Writer removes features from the array.	`feature-matrix.jsx` renders correctly after update — no broken categories, no NaN counts	Feature matrix is a computed output	Gap
9	Generate commission receipt with evidence trail	`libs/commissioning/receipt-generator.ts`	Produces `.ai/receipts/YYYY-MM-DD-commission-run.json` with per-feature evidence (previous state, new state, test files, pass/fail)	Receipt omits failures. Receipt generated even when script errors. Receipt overwrites previous receipt instead of appending.	—	Audit trail for every state change	Gap
10	CLI interface with dry-run, single-feature, and full-matrix modes	`tools/scripts/commission/commission-features.ts`	`--dry-run` shows changes without writing. `--feature=AUTH-001` commissions single feature. `--all` runs full matrix.	`--dry-run` actually writes changes. `--feature` runs all tests instead of scoped set. Script with no flags runs full matrix without confirmation.	—	Ergonomic for both CI and manual use	Gap

Job 4: E2E Verification via Playwright

#	Function	Artifact	Success Test	Safety Test	Regression Test	Value	State
11	Discover Playwright e2e specs from feature-test index	`libs/commissioning/e2e-discovery.ts`	Given `Map<FeatureID, TestFile[]>`, partitions into vitest unit specs and Playwright e2e specs by file path pattern (`*.spec.ts` in e2e project vs `__tests__/`)	E2e specs routed to vitest runner. Unit specs routed to Playwright. Discovery misses specs outside conventional paths.	Existing vitest scoping (#5) unaffected — e2e discovery is additive	Features with browser behavior get browser verification	Gap
12	Run Playwright specs scoped to feature via Nx target	`libs/commissioning/e2e-runner.ts`	Given feature's e2e spec paths, runs `nx e2e <project> --grep <spec>` headlessly. Captures exit code, trace, screenshot. JSON results extracted.	Playwright runs in headed mode in CI. Runner executes specs outside the scoped set. Runner uses `playwright test` directly instead of Nx target (bypasses project config).	Existing Nx e2e pipeline unaffected — commission runner uses same target, different filter	UI features verified by real browser, not just unit assertions	Gap
13	Merge e2e results into L-level computation	`libs/commissioning/level-computer.ts`	L-level computer treats e2e pass/fail as additional evidence alongside vitest results. Feature with passing unit tests but failing e2e is capped at L2.	Feature reaches L3 when e2e spec exists but wasn't run. E2e failure silently ignored when unit tests pass. Unit-only features penalized for missing e2e specs.	Existing L-level computation for unit-only features unchanged	L-levels reflect real user-facing behavior, not just logic correctness	Gap
14	Archive Playwright traces and screenshots as commission evidence	`libs/commissioning/e2e-evidence.ts`	Playwright traces saved to `dist/.playwright/traces/commission/<feature-id>/`. Screenshots saved alongside. Receipt (#9) includes e2e artifact paths.	Traces saved with PII from test fixtures. Traces accumulate without rotation. Evidence paths in receipt point to non-existent files.	—	Visual proof of feature state — commissioner can replay the trace	Gap
15	Detect mock routes in e2e specs (counterfeit test scanner)	`libs/commissioning/mock-detector.ts`	Given e2e spec files, AST-scan for `page.route()`, `page.fulfill()`, fixture injection patterns. Flag specs that intercept real API calls with canned responses. Report lists counterfeit specs with line numbers.	Mock-route specs pass the scanner. Scanner false-positives on legitimate test setup (auth state, seed data). Scanner modifies spec files.	—	Counterfeit tests made visible before they produce false L3 claims	Gap
16	Exclude unit tests from L-level computation	`libs/commissioning/level-computer.ts`	Test categorizer identifies unit tests (mocked dependencies, no real DB/server) and excludes them from L-level input. Only integration (real DB) and e2e (real server) results feed state computation.	Unit test with mocked DB counted as integration evidence. Hook test with stubbed API affects L-level.	Existing unit test reporting unaffected — tests still run, results just don't feed commissioning	L3 means real infrastructure was tested, not mocks	Gap

Job 5: Sync Results to Project Dashboard (N5 Bridge)

#	Function	Artifact	Success Test	Safety Test	Regression Test	Value	State
17	Call Convex mutation to sync L-levels after commission run	`libs/commissioning/convex-sync.ts`	After `--all` run, `commissioning_results` table in Convex has one row per feature with `featureId`, `prdRef`, `level`, `updatedAt`	Sync fires in `--dry-run` mode. Sync overwrites results from a later run with earlier data.	`listProjectsWithStats` query performance unchanged (<200ms)	Dashboard reads from one source (Convex) — not dream repo filesystem	Gap
18	Extend `listProjectsWithStats` to include commissioningLevel	`convex/projects.ts`	Query returns `commissioningLevel: number \| null` — `null` when no commission run, highest L-level when run exists	Projects with no commission run return `commissioningLevel: 0` instead of `null`. Query crashes when `commissioning_results` table is empty.	Existing plan/task stats in `ProjectWithStats` unchanged	Project dashboard shows commissioning health alongside plan progress	Gap

Convex schema addition:

// commissioning_results table
defineTable({
  featureId: v.string(),    // e.g. "AUTH-001"
  prdRef: v.string(),       // e.g. "prd-identity-access"
  level: v.number(),        // 0-4
  updatedAt: v.number(),    // Unix timestamp (ms) — use Date.now()
})

Principles

The Job

Element	Detail
Situation	After engineering merges code, someone must manually check which features advanced and edit `feature-matrix.json` by hand. With 210 features, this takes 30+ minutes and produces unreliable states.
Intention	Feature states are a function of test results. The matrix is a measurement instrument. Run the script, get the truth.
Obstacle	No bridge connects PRD Build Contract artifacts to test execution. Test files don't declare which feature they verify. Three FAVV formats coexist.
Hardest Thing	The mapping fidelity. A weak mapping (wrong test files → wrong features) produces states that look computed but are just as wrong as manual edits — except now with false confidence.

Why Now

210 features, 0 at L4. Manual tracking has hit its ceiling.
Agent-driven development means agents need deterministic commissioning — they can't make judgment calls about L-levels.
The FAVV v2.1 format now includes Artifact + Success Test + Safety Test columns — the mapping data finally exists in the PRD specs.
Every new PRD adds more features to track manually.

Design Constraints

Constraint	Rationale
Script lives in engineering repo	It runs tests — tests live in engineering. Dream repo is read-only from engineering CI.
Pure function: same input → same output	Commissioning must be trustworthy. Non-determinism means nobody trusts the matrix.
Under-report over over-report	A feature falsely at L3 is worse than a feature stuck at L0. Conservative state computation.
Convention over configuration	`@feature AUTH-001` docblock tag or `auth-001.spec.ts` naming beats a separate mapping file that drifts.
Playwright for e2e, vitest for unit	Two test runners, one L-level computation. Playwright provides deterministic, reproducible browser verification with traces, screenshots, and CI-friendly artifacts. Vitest handles logic. Both feed the same level-computer.
Specs over agent browsers	Playwright specs are durable executable knowledge — rerunnable, diffable, CI-split. Agent browsers are for exploration, not commissioning. Commissioning demands repeatability.
Reuse Nx e2e targets	Nx Playwright plugin infers `e2e` tasks from `playwright.config.ts`. Commission script invokes existing `nx e2e` targets, not a parallel browser workflow.

Performance

Priority Score

Dimension	Score (1-5)	Evidence
Pain	4	210 features manually tracked. States drift. CRM claims L3 with no Story Contract to verify. Every state update is a judgment call.
Demand	3	Internal demand. Every PRD review requires manual state checks. Backburner commissioning PRD exists because pain was identified. No external demand.
Edge	4	FAVV Build Contract with Artifact + Safety Test columns is unique infrastructure. Feature-ID-tagged test contracts feeding a state machine — no competitor has this.
Trend	4	Agent-driven development requires deterministic commissioning. Manual tracking breaks at scale. More PRDs = more features = more manual work.
Conversion	2	Internal tooling. No revenue path. Operational value: trustworthy feature matrix, faster commissioning.
Composite	384	4 × 3 × 4 × 4 × 2

Quality Targets

Target	Threshold
State computation accuracy	Zero false L3 (feature marked tested when tests fail)
Unmapped visibility	100% of unmapped features flagged in report
CI time overhead	≤3 minutes added to merge pipeline
Demotion detection	100% of regressions caught in next run

Failure Budget

Failure Type	Budget	Response
False L3 (tests fail but state says L3)	0	Kill the script, revert to manual until fixed
Missed demotion (tests regress, state stays)	0	Same — trust is binary
Wrong mapping (test assigned to wrong feature)	≤5%	Log warning, mark feature as `uncertain` in report

Kill signal: Script produces states that contradict manual commissioner judgment on >20% of features after 3 runs. If the algorithm disagrees with reality that often, the mapping is wrong — fix the mapping before running again.

Platform

Current State

Component	Built	Wired	Working
PRD specs with FAVV Build Contracts	Yes (3 PRDs have full Story+Build)	No — not parsed by any commissioning script	Partial — data exists but isn't consumed
feature-matrix.json	Yes	Yes — powers feature-matrix.jsx page	Yes — but hand-edited
Vitest in engineering repo	Yes	Yes — runs in CI	Yes — but not scoped to feature IDs
Agent receipt schema	Yes (v1.0 defined)	No — no commissioning script emits receipts	No
FAVV parser in project-from-prd	Yes	Yes — reads Build Contracts for engineering tasks	Partial — doesn't extract feature-to-test mapping
Playwright in Nx monorepo	Yes — Nx plugin installed, `playwright.config.ts` present	Yes — `nx e2e` targets inferred, CI runs e2e specs	Yes — but not scoped to feature IDs or connected to commissioning

Build Ratio

~50% composition (extend existing FAVV parser, use existing vitest, reuse Nx Playwright targets, write to existing JSON), ~50% new code (index builder, level computer, matrix writer, receipt generator, e2e discovery/runner/evidence).

Protocols

Build Order

Sprint	Features	What	Effort	Acceptance
0	#1, #2, #3, #4	FAVV parser + index builder + unmapped report	3 days	Parser extracts mappings from 3 PRDs. Unmapped count matches manual count.
1	#5, #6, #7	Scoped test runner + L-level computer + regression detection	4 days	Script computes L-levels for Identity & Access features. Results match manual assessment.
2	#8, #9, #10	Matrix writer + receipt generator + CLI	2 days	`--dry-run` shows correct diff. `--all` updates JSON. Receipt saved.
3	CI integration	Wire into merge pipeline	1 day	Merge to main triggers commission run. JSON updated automatically.
4	#11, #12, #13, #14	Playwright e2e discovery + scoped runner + L-level merge + evidence archival	3 days	Features with e2e specs get browser verification. Traces archived. L-levels reflect e2e results.

Commissioning

#	Feature	Install	Test	Operational	Optimize
1	FAVV parser	—	—	—	—
2	Format compatibility (FFO/v2.0/v2.1)	—	—	—	—
3	Index builder	—	—	—	—
4	Unmapped feature detection	—	—	—	—
5	Scoped test runner	—	—	—	—
6	L-level computer	—	—	—	—
7	Regression detection	—	—	—	—
8	Matrix writer	—	—	—	—
9	Receipt generator	—	—	—	—
10	CLI interface	—	—	—	—
11	E2e spec discovery	—	—	—	—
12	Playwright scoped runner	—	—	—	—
13	E2e + unit L-level merge	—	—	—	—
14	Trace/screenshot archival	—	—	—	—
15	Mock route detection (counterfeit test scanner)	—	—	—	—
16	Unit test exclusion from L3 evidence	—	—	—	—

Agent-Facing Spec

Commands:

# From engineering repo root — unit + integration commissioning
npx tsx tools/scripts/commission/commission-features.ts --dry-run    # preview
npx tsx tools/scripts/commission/commission-features.ts --feature=AUTH-001  # single
npx tsx tools/scripts/commission/commission-features.ts --all        # full matrix
npx tsx tools/scripts/commission/commission-features.ts --report     # unmapped report only

# E2e commissioning via Playwright (scoped to feature)
npx tsx tools/scripts/commission/commission-features.ts --feature=AUTH-001 --e2e  # includes Playwright
pnpm nx e2e web-e2e --grep="AUTH-001"                               # direct Nx target
pnpm exec playwright test --project=chromium --grep="AUTH-001"       # direct Playwright CLI

# Debugging and investigation
pnpm exec playwright codegen http://localhost:3000                   # generate selectors
pnpm exec playwright test apps/web-e2e/src/auth.spec.ts --debug     # step-through debug
pnpm nx e2e web-e2e --ui                                            # interactive UI mode

Boundaries:

Always: read PRD specs, run mapped tests, write JSON, emit receipt
Ask first: demoting a feature that was manually set to L3+
Never: modify test files, change PRD specs, delete features from JSON

Test Contract:

#	Feature	Test File	Assertion
1	FAVV parser	`libs/commissioning/__tests__/favv-parser.spec.ts`	Extracts correct rows from v2.1, v2.0, FFO formats
2	Index builder	`libs/commissioning/__tests__/index-builder.spec.ts`	Produces correct Map, flags unmapped features
3	L-level computer	`libs/commissioning/__tests__/level-computer.spec.ts`	Decision tree matches expected L-levels for test fixtures
4	Matrix writer	`libs/commissioning/__tests__/matrix-writer.spec.ts`	JSON structure preserved, only state+updated change
5	CLI	`libs/commissioning/__tests__/cli.spec.ts`	--dry-run doesn't write, --feature scopes correctly
6	E2e discovery	`libs/commissioning/__tests__/e2e-discovery.spec.ts`	Correctly partitions vitest vs Playwright specs by path convention
7	E2e runner	`libs/commissioning/__tests__/e2e-runner.spec.ts`	Invokes Nx target with correct grep, captures trace artifacts
8	E2e + unit merge	`libs/commissioning/__tests__/level-computer-e2e.spec.ts`	Feature with passing unit but failing e2e capped at L2

Players

Demand-Side Jobs

Job 1: Commissioner Verifies Feature States

Situation: After engineering ships a sprint, the commissioner needs to know which features actually advanced.

Element	Detail
Struggling moment	Opens feature-matrix.json, sees states from last manual edit. No idea if they're still true. Has to open the app, check each feature, type the new level.
Current workaround	Manual inspection + hand-editing JSON. Takes 30+ minutes for a full pass. Usually skipped — states go stale.
What progress looks like	Run one command, get a report. States reflect test results. Regressions flagged.
Hidden objection	"What if the script is wrong and I trusted it?" — the false L3 fear.
Switch trigger	Feature count exceeds 200 (already happened). Manual tracking visibly wrong on a commissioned feature.

Features that serve this job: #5, #6, #7, #8, #9, #10

Job 2: Engineering Agent Updates Matrix After Merge

Situation: CI pipeline merges code. Feature states should reflect the new reality.

Element	Detail
Struggling moment	Code merged, tests pass, but feature matrix still shows old states. Someone has to remember to update it.
Current workaround	Post-merge manual edit, often forgotten. States lag reality by days or weeks.
What progress looks like	CI runs commission script automatically. Matrix updated within 5 minutes of merge.
Hidden objection	"What if CI takes too long?" — the pipeline slowdown fear.
Switch trigger	Feature state was wrong during a demo or review.

Features that serve this job: #5, #8, #10

Role Definitions

Role	Access	Permissions
Commission script (CI)	Read: PRD specs, test files, feature-matrix.json. Write: feature-matrix.json, receipts.	Run tests, compute states, write results.
Commissioner (human)	Read: commission report, receipts. Write: L4 sign-off.	Override script states with evidence. Approve L3→L4 transitions.
PRD author	Read: unmapped report. Write: PRD spec Artifact columns.	Fix missing mappings by adding Artifact paths to Build Contract.

Relationship to Other PRDs

PRD	Relationship	Data Flow
Project Management System	Downstream — N3 project dashboard consumes commissioning results	Script writes L-levels to `commissioning_results` Convex table. `listProjectsWithStats` joins to return `commissioningLevel` per project. N5 is the bridge build.
Commissioning State Machine	Peer — they commission data tables (195), we commission RaaS features (210)	Shared L-level vocabulary. Different subjects, same progression model. Could merge later.
Agent Platform	Upstream — provides FAVV Build Contract with test artifacts	Script parses Agent Platform's spec to build feature-to-test index
Identity & Access	Upstream — provides Story Contract + Build Contract	Script parses Identity's spec as test case for parser
Sales CRM & RFP	Upstream — BLOCKED: no Story Contract, FFO format only	Script must handle FFO format. CRM features will show as partially mapped until Story Contract added.
CLI Platform	Peer — commission script could become a `drmg commission` subcommand	Script initially standalone. Absorb into CLI Platform when that PRD ships.

Context

PRD Index — Automated Commissioning
Prompt Deck — 5-card pitch
Pictures — Pre-flight maps

Questions

What breaks first when the script disagrees with a human commissioner?

If 30% of features are unmapped, is the script's output trustworthy enough to replace manual edits?
Should the script refuse to write results when unmapped percentage exceeds a threshold?
When CRM gets a Story Contract, how many features jump from L0 to L3 in one run — and does that shock look like a bug?
Is the Safety Test gate (blocking L3) too conservative — or not conservative enough?

Intent Contract​

Story Contract​

S1 — Auto-update on merge​

S2 — Unmapped features visible​

S3 — Safety Test blocks L3​

S4 — Regression triggers demotion​

S5 — All three FAVV formats parsed​

S6 — UI features verified via Playwright​

S7 — Mock routes flagged as counterfeit​

S8 — Unit tests excluded from L3 evidence​

S9 — Commissioning level appears in ProjectWithStats​

Build Contract​

Job 1: Parse PRD Specs into Feature-Test Index​

Job 2: Run Scoped Tests and Compute L-Levels​

Job 3: Write Results and Produce Evidence​

Job 4: E2E Verification via Playwright​

Job 5: Sync Results to Project Dashboard (N5 Bridge)​

Principles​

The Job​

Why Now​

Design Constraints​

Performance​

Priority Score​

Quality Targets​

Failure Budget​

Platform​

Current State​

Build Ratio​

Protocols​

Build Order​

Commissioning​

Agent-Facing Spec​

Players​

Demand-Side Jobs​

Job 1: Commissioner Verifies Feature States​

Job 2: Engineering Agent Updates Matrix After Merge​

Role Definitions​

Relationship to Other PRDs​

Context​

Questions​

Intent Contract

Story Contract

S1 — Auto-update on merge

S2 — Unmapped features visible

S3 — Safety Test blocks L3

S4 — Regression triggers demotion

S5 — All three FAVV formats parsed

S6 — UI features verified via Playwright

S7 — Mock routes flagged as counterfeit

S8 — Unit tests excluded from L3 evidence

S9 — Commissioning level appears in ProjectWithStats

Build Contract

Job 1: Parse PRD Specs into Feature-Test Index

Job 2: Run Scoped Tests and Compute L-Levels

Job 3: Write Results and Produce Evidence

Job 4: E2E Verification via Playwright

Job 5: Sync Results to Project Dashboard (N5 Bridge)

Principles

The Job

Why Now

Design Constraints

Performance

Priority Score

Quality Targets

Failure Budget

Platform

Current State

Build Ratio

Protocols

Build Order

Commissioning

Agent-Facing Spec

Players

Demand-Side Jobs

Job 1: Commissioner Verifies Feature States

Job 2: Engineering Agent Updates Matrix After Merge

Role Definitions

Relationship to Other PRDs

Context

Questions