Skip to main content

← Automated Commissioning · Prompt Deck · Pictures

Automated Commissioning Spec

How do we make the feature matrix a measurement instrument instead of an opinion ledger?

Intent Contract

DimensionStatement
ObjectiveFeature matrix states must be computed from test results, not hand-edited — because manual L-levels drift, lie, and can't scale past 50 features without full-time curation
Outcomes(1) feature-matrix.json states updated by script within 5 minutes of merge to main. (2) Every state change has an evidence trail in .ai/receipts/. (3) Unmapped features are visible, not hidden at L0.
Health MetricsExisting feature states that are correct must not regress. Script must not mark features as L3 when tests are actually failing.
ConstraintsHard: script reads PRD specs from dream repo, runs tests in engineering repo — never writes to dream repo from engineering CI. Steering: prefer convention (file naming) over configuration (mapping files) for feature-to-test linking.
AutonomyAllowed: test runner selection, output format, caching strategy. Escalate: changing L-level definitions, adding new states beyond L0-L4. Never: deleting test files, modifying PRD specs, changing feature IDs.
Stop RulesComplete when: script runs in CI on merge to main, updates feature-matrix.json, produces evidence receipt. Halt when: script produces states that contradict manual commissioner judgment on >20% of features.
Counter-metricsCI pipeline time must not increase by more than 3 minutes. False positives (features marked L3 that don't work) must be zero — better to under-report than over-report.
Blast Radiusfeature-matrix.json (dream repo), .ai/receipts/ (dream repo), CI pipeline (engineering repo). No user-facing pages change behavior — the feature matrix page reads from the same JSON.
RollbackRevert the JSON update commit. Previous states are in git history. Receipt files are append-only.

Story Contract

Stories are test contracts. Each row advances to ≥1 test. Tests must be RED before implementation starts. GREEN = value delivered.


S1 — Auto-update on merge

Trigger: PR merged to main in engineering repo

Checklist:

  • feature-matrix.json diff shows state changes within 5 minutes of merge
  • Only features whose mapped tests were touched have state changes
  • No feature reaches L3 when its test results are failing

Forbidden: States change for untouched features. Feature marked L3 with failing tests.

Evidence: integration — libs/commissioning/__tests__/

Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)


S2 — Unmapped features visible

Trigger: Commission script run completes

Checklist:

  • Report shows unmapped count with specific feature IDs
  • No unmapped feature silently stays at L0 without a flag
  • Script does not report 100% coverage when mappings are missing

Forbidden: Unmapped features hidden. Coverage report lies.

Evidence: unit — libs/commissioning/__tests__/index-builder.spec.ts

Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)


S3 — Safety Test blocks L3

Trigger: Safety Test assertion fails in test run

Checklist:

  • Feature state capped at L2 regardless of Success Test results
  • Safety Test failure logged with the feature ID
  • Escalate to commissioner if >5 features blocked simultaneously

Forbidden: Feature reaches L3 with active safety violation. Failures silently ignored.

Evidence: integration — libs/commissioning/__tests__/level-computer.spec.ts

Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)


S4 — Regression triggers demotion

Trigger: Previously-passing test starts failing

Checklist:

  • Feature state moves from L3 to L2 (or lower) in next commission run
  • Demotion logged with previous state, new state, and failing test file
  • No one-way ratchet — states can go down

Forbidden: State stays at L3 after tests break. Demotion applied silently without logging.

Evidence: integration — libs/commissioning/__tests__/level-computer.spec.ts

Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)


S5 — All three FAVV formats parsed

Trigger: PRD uses FFO, FAVV v2.0, or FAVV v2.1 format

Checklist:

  • Parser extracts feature-to-test mappings from FFO (6 cols), v2.0, and v2.1 (9 cols)
  • v2.0 and FFO PRDs are not silently skipped
  • Partial results returned when table is malformed — no crash, log warning

Forbidden: Crash on FFO. Silent skip of v2.0. Only v2.1 produces mappings.

Evidence: unit — libs/commissioning/__tests__/favv-parser.spec.ts

Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)


S6 — UI features verified via Playwright

Trigger: Commission script encounters feature with e2e spec

Checklist:

  • Playwright runs spec headlessly and captures trace + screenshot
  • Result feeds L-level computation alongside unit results
  • Feature with passing unit tests but failing e2e is capped at L2
  • Escalate to commissioner if Playwright CI adds >2 minutes per feature

Forbidden: Feature reaches L3 from unit tests alone when e2e spec exists but fails. Playwright runs headed in CI. Agent browser used instead of spec.

Evidence: e2e — libs/commissioning/__tests__/e2e-runner.spec.ts

Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)


S7 — Mock routes flagged as counterfeit

Trigger: Commission script analyzes e2e spec files before running

Checklist:

  • AST scan flags page.route() and page.fulfill() usage
  • Report lists counterfeit specs with file path and line number
  • Scanner does not modify spec files
  • No false positives on legitimate test setup (auth state, seed data)

Forbidden: Mock-route specs counted as passing evidence. Fixture data accepted as integration proof.

Evidence: static-analysis — libs/commissioning/__tests__/mock-detector.spec.ts

Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)


S8 — Unit tests excluded from L3 evidence

Trigger: Commission script categorizes test results

Checklist:

  • Unit tests (mocked DB/server) logged but not used for L-level state changes
  • Only integration (real DB) and e2e (real server) results feed computation
  • Unit test exclusion does not affect unit test reporting elsewhere

Forbidden: Unit test with mocked DB counted as integration evidence. Hook test with stubbed API affects L-level.

Evidence: unit — libs/commissioning/__tests__/level-computer.spec.ts

Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)


S9 — Commissioning level appears in ProjectWithStats

Trigger: Commission script completes a run against a project's features

Checklist:

  • commissioning_results Convex table updated with { featureId, prdRef, level, updatedAt } per feature
  • listProjectsWithStats returns commissioningLevel: number | null for each project (highest L-level across its features)
  • Project with no commissioning run returns commissioningLevel: null — not 0
  • Commission script does not write to Convex when --dry-run flag is active

Forbidden: Dashboard shows commissioningLevel: 0 when no commission run has occurred. Script writes to Convex in dry-run mode.

Evidence: integration — libs/commissioning/__tests__/convex-sync.spec.ts

Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)


Build Contract

Job 1: Parse PRD Specs into Feature-Test Index

#FunctionArtifactSuccess TestSafety TestRegression TestValueState
1Parse FAVV v2.1 Build Contract tables from PRD specslibs/commissioning/parsers/favv-parser.tsGiven prd-identity-access/spec/index.md, extracts 8+ FAVV rows with feature IDs and artifact pathsParser invents mappings for features not in the Build Contract. Parser crashes on malformed tables instead of logging warning.Existing PRD parsing in project-from-prd must not breakFeature-to-test index existsGap
2Parse FAVV v2.0 and FFO format Build Contractslibs/commissioning/parsers/favv-parser.tsGiven a v2.0 table (6 columns) or FFO table, extracts feature-function-artifact mappingsv2.0/FFO PRDs silently skipped. Parser returns empty array instead of partial results.All PRD formats produce mappingsGap
3Build feature-to-test-file index from parsed artifactslibs/commissioning/index-builder.tsGiven parsed FAVV rows, produces Map<FeatureID, TestFile[]> with deduplicationIndex includes test files that don't exist on disk. Index silently drops features with no artifact column.Single source of truth for what tests verify what featuresGap
4Detect and report unmapped featureslibs/commissioning/index-builder.tsGiven feature-matrix.json (210 IDs) and parsed index, reports features with zero test file mappingsUnmapped features hidden. Report says "100% mapped" when mappings are missing.Visibility into verification gapsGap

Job 2: Run Scoped Tests and Compute L-Levels

#FunctionArtifactSuccess TestSafety TestRegression TestValueState
5Run vitest scoped to indexed test files onlylibs/commissioning/test-runner.tsGiven Map<FeatureID, TestFile[]>, runs vitest with JSON reporter on only those files. Exit code + results captured.Script runs ALL tests instead of scoped set. Script modifies test files. Script runs with production credentials.CI pipeline time increase ≤3 minutes over baselineOnly relevant tests run — fast feedbackGap
6Compute L-level per feature from test resultslibs/commissioning/level-computer.tsGiven test results JSON and schema/UI existence checks, computes L0-L4 per feature using decision treeFeature reaches L3 with failing Safety Test. Feature at L3 when no tests mapped (should be L0). L-level goes UP when evidence goes DOWN.States are computed, not opinedGap
7Detect regressions — features that should demotelibs/commissioning/level-computer.tsGiven previous feature-matrix.json states and new computed states, flags demotions with evidenceDemotions silently applied without logging. Script refuses to demote and keeps stale state.Trust in the matrix — states reflect realityGap

Job 3: Write Results and Produce Evidence

#FunctionArtifactSuccess TestSafety TestRegression TestValueState
8Update feature-matrix.json with computed stateslibs/commissioning/matrix-writer.tsGiven computed L-levels, updates state and updated fields in JSON. Diff is minimal (only changed rows).Writer corrupts JSON structure. Writer changes fields other than state and updated. Writer removes features from the array.feature-matrix.jsx renders correctly after update — no broken categories, no NaN countsFeature matrix is a computed outputGap
9Generate commission receipt with evidence traillibs/commissioning/receipt-generator.tsProduces .ai/receipts/YYYY-MM-DD-commission-run.json with per-feature evidence (previous state, new state, test files, pass/fail)Receipt omits failures. Receipt generated even when script errors. Receipt overwrites previous receipt instead of appending.Audit trail for every state changeGap
10CLI interface with dry-run, single-feature, and full-matrix modestools/scripts/commission/commission-features.ts--dry-run shows changes without writing. --feature=AUTH-001 commissions single feature. --all runs full matrix.--dry-run actually writes changes. --feature runs all tests instead of scoped set. Script with no flags runs full matrix without confirmation.Ergonomic for both CI and manual useGap

Job 4: E2E Verification via Playwright

#FunctionArtifactSuccess TestSafety TestRegression TestValueState
11Discover Playwright e2e specs from feature-test indexlibs/commissioning/e2e-discovery.tsGiven Map<FeatureID, TestFile[]>, partitions into vitest unit specs and Playwright e2e specs by file path pattern (*.spec.ts in e2e project vs __tests__/)E2e specs routed to vitest runner. Unit specs routed to Playwright. Discovery misses specs outside conventional paths.Existing vitest scoping (#5) unaffected — e2e discovery is additiveFeatures with browser behavior get browser verificationGap
12Run Playwright specs scoped to feature via Nx targetlibs/commissioning/e2e-runner.tsGiven feature's e2e spec paths, runs nx e2e <project> --grep <spec> headlessly. Captures exit code, trace, screenshot. JSON results extracted.Playwright runs in headed mode in CI. Runner executes specs outside the scoped set. Runner uses playwright test directly instead of Nx target (bypasses project config).Existing Nx e2e pipeline unaffected — commission runner uses same target, different filterUI features verified by real browser, not just unit assertionsGap
13Merge e2e results into L-level computationlibs/commissioning/level-computer.tsL-level computer treats e2e pass/fail as additional evidence alongside vitest results. Feature with passing unit tests but failing e2e is capped at L2.Feature reaches L3 when e2e spec exists but wasn't run. E2e failure silently ignored when unit tests pass. Unit-only features penalized for missing e2e specs.Existing L-level computation for unit-only features unchangedL-levels reflect real user-facing behavior, not just logic correctnessGap
14Archive Playwright traces and screenshots as commission evidencelibs/commissioning/e2e-evidence.tsPlaywright traces saved to dist/.playwright/traces/commission/<feature-id>/. Screenshots saved alongside. Receipt (#9) includes e2e artifact paths.Traces saved with PII from test fixtures. Traces accumulate without rotation. Evidence paths in receipt point to non-existent files.Visual proof of feature state — commissioner can replay the traceGap
15Detect mock routes in e2e specs (counterfeit test scanner)libs/commissioning/mock-detector.tsGiven e2e spec files, AST-scan for page.route(), page.fulfill(), fixture injection patterns. Flag specs that intercept real API calls with canned responses. Report lists counterfeit specs with line numbers.Mock-route specs pass the scanner. Scanner false-positives on legitimate test setup (auth state, seed data). Scanner modifies spec files.Counterfeit tests made visible before they produce false L3 claimsGap
16Exclude unit tests from L-level computationlibs/commissioning/level-computer.tsTest categorizer identifies unit tests (mocked dependencies, no real DB/server) and excludes them from L-level input. Only integration (real DB) and e2e (real server) results feed state computation.Unit test with mocked DB counted as integration evidence. Hook test with stubbed API affects L-level.Existing unit test reporting unaffected — tests still run, results just don't feed commissioningL3 means real infrastructure was tested, not mocksGap

Job 5: Sync Results to Project Dashboard (N5 Bridge)

#FunctionArtifactSuccess TestSafety TestRegression TestValueState
17Call Convex mutation to sync L-levels after commission runlibs/commissioning/convex-sync.tsAfter --all run, commissioning_results table in Convex has one row per feature with featureId, prdRef, level, updatedAtSync fires in --dry-run mode. Sync overwrites results from a later run with earlier data.listProjectsWithStats query performance unchanged (<200ms)Dashboard reads from one source (Convex) — not dream repo filesystemGap
18Extend listProjectsWithStats to include commissioningLevelconvex/projects.tsQuery returns commissioningLevel: number | nullnull when no commission run, highest L-level when run existsProjects with no commission run return commissioningLevel: 0 instead of null. Query crashes when commissioning_results table is empty.Existing plan/task stats in ProjectWithStats unchangedProject dashboard shows commissioning health alongside plan progressGap

Convex schema addition:

// commissioning_results table
defineTable({
featureId: v.string(), // e.g. "AUTH-001"
prdRef: v.string(), // e.g. "prd-identity-access"
level: v.number(), // 0-4
updatedAt: v.number(), // Unix timestamp (ms) — use Date.now()
})

Principles

The Job

ElementDetail
SituationAfter engineering merges code, someone must manually check which features advanced and edit feature-matrix.json by hand. With 210 features, this takes 30+ minutes and produces unreliable states.
IntentionFeature states are a function of test results. The matrix is a measurement instrument. Run the script, get the truth.
ObstacleNo bridge connects PRD Build Contract artifacts to test execution. Test files don't declare which feature they verify. Three FAVV formats coexist.
Hardest ThingThe mapping fidelity. A weak mapping (wrong test files → wrong features) produces states that look computed but are just as wrong as manual edits — except now with false confidence.

Why Now

  • 210 features, 0 at L4. Manual tracking has hit its ceiling.
  • Agent-driven development means agents need deterministic commissioning — they can't make judgment calls about L-levels.
  • The FAVV v2.1 format now includes Artifact + Success Test + Safety Test columns — the mapping data finally exists in the PRD specs.
  • Every new PRD adds more features to track manually.

Design Constraints

ConstraintRationale
Script lives in engineering repoIt runs tests — tests live in engineering. Dream repo is read-only from engineering CI.
Pure function: same input → same outputCommissioning must be trustworthy. Non-determinism means nobody trusts the matrix.
Under-report over over-reportA feature falsely at L3 is worse than a feature stuck at L0. Conservative state computation.
Convention over configuration@feature AUTH-001 docblock tag or auth-001.spec.ts naming beats a separate mapping file that drifts.
Playwright for e2e, vitest for unitTwo test runners, one L-level computation. Playwright provides deterministic, reproducible browser verification with traces, screenshots, and CI-friendly artifacts. Vitest handles logic. Both feed the same level-computer.
Specs over agent browsersPlaywright specs are durable executable knowledge — rerunnable, diffable, CI-split. Agent browsers are for exploration, not commissioning. Commissioning demands repeatability.
Reuse Nx e2e targetsNx Playwright plugin infers e2e tasks from playwright.config.ts. Commission script invokes existing nx e2e targets, not a parallel browser workflow.

Performance

Priority Score

DimensionScore (1-5)Evidence
Pain4210 features manually tracked. States drift. CRM claims L3 with no Story Contract to verify. Every state update is a judgment call.
Demand3Internal demand. Every PRD review requires manual state checks. Backburner commissioning PRD exists because pain was identified. No external demand.
Edge4FAVV Build Contract with Artifact + Safety Test columns is unique infrastructure. Feature-ID-tagged test contracts feeding a state machine — no competitor has this.
Trend4Agent-driven development requires deterministic commissioning. Manual tracking breaks at scale. More PRDs = more features = more manual work.
Conversion2Internal tooling. No revenue path. Operational value: trustworthy feature matrix, faster commissioning.
Composite3844 × 3 × 4 × 4 × 2

Quality Targets

TargetThreshold
State computation accuracyZero false L3 (feature marked tested when tests fail)
Unmapped visibility100% of unmapped features flagged in report
CI time overhead≤3 minutes added to merge pipeline
Demotion detection100% of regressions caught in next run

Failure Budget

Failure TypeBudgetResponse
False L3 (tests fail but state says L3)0Kill the script, revert to manual until fixed
Missed demotion (tests regress, state stays)0Same — trust is binary
Wrong mapping (test assigned to wrong feature)≤5%Log warning, mark feature as uncertain in report

Kill signal: Script produces states that contradict manual commissioner judgment on >20% of features after 3 runs. If the algorithm disagrees with reality that often, the mapping is wrong — fix the mapping before running again.

Platform

Current State

ComponentBuiltWiredWorking
PRD specs with FAVV Build ContractsYes (3 PRDs have full Story+Build)No — not parsed by any commissioning scriptPartial — data exists but isn't consumed
feature-matrix.jsonYesYes — powers feature-matrix.jsx pageYes — but hand-edited
Vitest in engineering repoYesYes — runs in CIYes — but not scoped to feature IDs
Agent receipt schemaYes (v1.0 defined)No — no commissioning script emits receiptsNo
FAVV parser in project-from-prdYesYes — reads Build Contracts for engineering tasksPartial — doesn't extract feature-to-test mapping
Playwright in Nx monorepoYes — Nx plugin installed, playwright.config.ts presentYes — nx e2e targets inferred, CI runs e2e specsYes — but not scoped to feature IDs or connected to commissioning

Build Ratio

~50% composition (extend existing FAVV parser, use existing vitest, reuse Nx Playwright targets, write to existing JSON), ~50% new code (index builder, level computer, matrix writer, receipt generator, e2e discovery/runner/evidence).

Protocols

Build Order

SprintFeaturesWhatEffortAcceptance
0#1, #2, #3, #4FAVV parser + index builder + unmapped report3 daysParser extracts mappings from 3 PRDs. Unmapped count matches manual count.
1#5, #6, #7Scoped test runner + L-level computer + regression detection4 daysScript computes L-levels for Identity & Access features. Results match manual assessment.
2#8, #9, #10Matrix writer + receipt generator + CLI2 days--dry-run shows correct diff. --all updates JSON. Receipt saved.
3CI integrationWire into merge pipeline1 dayMerge to main triggers commission run. JSON updated automatically.
4#11, #12, #13, #14Playwright e2e discovery + scoped runner + L-level merge + evidence archival3 daysFeatures with e2e specs get browser verification. Traces archived. L-levels reflect e2e results.

Commissioning

#FeatureInstallTestOperationalOptimize
1FAVV parser
2Format compatibility (FFO/v2.0/v2.1)
3Index builder
4Unmapped feature detection
5Scoped test runner
6L-level computer
7Regression detection
8Matrix writer
9Receipt generator
10CLI interface
11E2e spec discovery
12Playwright scoped runner
13E2e + unit L-level merge
14Trace/screenshot archival
15Mock route detection (counterfeit test scanner)
16Unit test exclusion from L3 evidence

Agent-Facing Spec

Commands:

# From engineering repo root — unit + integration commissioning
npx tsx tools/scripts/commission/commission-features.ts --dry-run # preview
npx tsx tools/scripts/commission/commission-features.ts --feature=AUTH-001 # single
npx tsx tools/scripts/commission/commission-features.ts --all # full matrix
npx tsx tools/scripts/commission/commission-features.ts --report # unmapped report only

# E2e commissioning via Playwright (scoped to feature)
npx tsx tools/scripts/commission/commission-features.ts --feature=AUTH-001 --e2e # includes Playwright
pnpm nx e2e web-e2e --grep="AUTH-001" # direct Nx target
pnpm exec playwright test --project=chromium --grep="AUTH-001" # direct Playwright CLI

# Debugging and investigation
pnpm exec playwright codegen http://localhost:3000 # generate selectors
pnpm exec playwright test apps/web-e2e/src/auth.spec.ts --debug # step-through debug
pnpm nx e2e web-e2e --ui # interactive UI mode

Boundaries:

  • Always: read PRD specs, run mapped tests, write JSON, emit receipt
  • Ask first: demoting a feature that was manually set to L3+
  • Never: modify test files, change PRD specs, delete features from JSON

Test Contract:

#FeatureTest FileAssertion
1FAVV parserlibs/commissioning/__tests__/favv-parser.spec.tsExtracts correct rows from v2.1, v2.0, FFO formats
2Index builderlibs/commissioning/__tests__/index-builder.spec.tsProduces correct Map, flags unmapped features
3L-level computerlibs/commissioning/__tests__/level-computer.spec.tsDecision tree matches expected L-levels for test fixtures
4Matrix writerlibs/commissioning/__tests__/matrix-writer.spec.tsJSON structure preserved, only state+updated change
5CLIlibs/commissioning/__tests__/cli.spec.ts--dry-run doesn't write, --feature scopes correctly
6E2e discoverylibs/commissioning/__tests__/e2e-discovery.spec.tsCorrectly partitions vitest vs Playwright specs by path convention
7E2e runnerlibs/commissioning/__tests__/e2e-runner.spec.tsInvokes Nx target with correct grep, captures trace artifacts
8E2e + unit mergelibs/commissioning/__tests__/level-computer-e2e.spec.tsFeature with passing unit but failing e2e capped at L2

Players

Demand-Side Jobs

Job 1: Commissioner Verifies Feature States

Situation: After engineering ships a sprint, the commissioner needs to know which features actually advanced.

ElementDetail
Struggling momentOpens feature-matrix.json, sees states from last manual edit. No idea if they're still true. Has to open the app, check each feature, type the new level.
Current workaroundManual inspection + hand-editing JSON. Takes 30+ minutes for a full pass. Usually skipped — states go stale.
What progress looks likeRun one command, get a report. States reflect test results. Regressions flagged.
Hidden objection"What if the script is wrong and I trusted it?" — the false L3 fear.
Switch triggerFeature count exceeds 200 (already happened). Manual tracking visibly wrong on a commissioned feature.

Features that serve this job: #5, #6, #7, #8, #9, #10

Job 2: Engineering Agent Updates Matrix After Merge

Situation: CI pipeline merges code. Feature states should reflect the new reality.

ElementDetail
Struggling momentCode merged, tests pass, but feature matrix still shows old states. Someone has to remember to update it.
Current workaroundPost-merge manual edit, often forgotten. States lag reality by days or weeks.
What progress looks likeCI runs commission script automatically. Matrix updated within 5 minutes of merge.
Hidden objection"What if CI takes too long?" — the pipeline slowdown fear.
Switch triggerFeature state was wrong during a demo or review.

Features that serve this job: #5, #8, #10

Role Definitions

RoleAccessPermissions
Commission script (CI)Read: PRD specs, test files, feature-matrix.json. Write: feature-matrix.json, receipts.Run tests, compute states, write results.
Commissioner (human)Read: commission report, receipts. Write: L4 sign-off.Override script states with evidence. Approve L3→L4 transitions.
PRD authorRead: unmapped report. Write: PRD spec Artifact columns.Fix missing mappings by adding Artifact paths to Build Contract.

Relationship to Other PRDs

PRDRelationshipData Flow
Project Management SystemDownstream — N3 project dashboard consumes commissioning resultsScript writes L-levels to commissioning_results Convex table. listProjectsWithStats joins to return commissioningLevel per project. N5 is the bridge build.
Commissioning State MachinePeer — they commission data tables (195), we commission RaaS features (210)Shared L-level vocabulary. Different subjects, same progression model. Could merge later.
Agent PlatformUpstream — provides FAVV Build Contract with test artifactsScript parses Agent Platform's spec to build feature-to-test index
Identity & AccessUpstream — provides Story Contract + Build ContractScript parses Identity's spec as test case for parser
Sales CRM & RFPUpstream — BLOCKED: no Story Contract, FFO format onlyScript must handle FFO format. CRM features will show as partially mapped until Story Contract added.
CLI PlatformPeer — commission script could become a drmg commission subcommandScript initially standalone. Absorb into CLI Platform when that PRD ships.

Context

Questions

What breaks first when the script disagrees with a human commissioner?

  • If 30% of features are unmapped, is the script's output trustworthy enough to replace manual edits?
  • Should the script refuse to write results when unmapped percentage exceeds a threshold?
  • When CRM gets a Story Contract, how many features jump from L0 to L3 in one run — and does that shock look like a bug?
  • Is the Safety Test gate (blocking L3) too conservative — or not conservative enough?