Validate External Outcomes

Protocol for Commissioning PRD Expectation vs Results

When is a capability ready to ship — and how do you prove it?

The team that builds a system is never the team that commissions it. The builder knows what they intended. The commissioner checks what actually shipped.

Maturity Levels

Every Mycelium capability is scored on a 5-level maturity scale:

Level	Meaning	Evidence Required
L0	Spec only	PRD written, no build
L1	Schema + API	Backend exists, no interface
L2	UI connected	Users can interact
L3	Tested	Automated verification + intent spec passes
L4	Commissioned	Independent verification against PRD criteria

Status Vocabulary

Exact definitions. No synonyms. If engineering and dream team use different words for the same state, the report lies.

Status	Meaning	Evidence	NOT the same as
Gap	Identified need, no PRD	Mentioned in another PRD or index	Spec
Spec	PRD written, features undefined or unscored	PRD index.md exists	Spec draft
Spec draft	PRD exists, features listed but incomplete	Feature table present, gaps noted	L0
Spec complete	PRD fully specified, ready for engineering	All sections filled, scored	L0
L0	Features assessed, all scored as Gap	Feature table with Gap/Done per feature	Spec
L1	Schema + API deployed	Backend responds, no UI	L0
L2	UI connected, users can interact	Pages render, forms submit	L1
L3	Tested, automated + intent verification	E2E tests pass, intent spec verified, evidence captured	L2
L4	Commissioned by independent verification	Commissioner grants after browser walkthrough	L3

The critical distinction: "Spec" means features haven't been individually assessed. "L0" means they HAVE been assessed and scored as Gap. A PRD at L0 has more structure than one at Spec — it knows exactly what's missing.

Feature vs capability maturity: Feature commissioning (Install → Test → Operational → Optimize) tracks individual features within a capability. Capability maturity (L0-L4) tracks the aggregate. A capability at L2 may have features at Install, Test, and Gap simultaneously.

The Process

How a capability moves from L0 to L4:

L0: SPEC ONLY           L1: SCHEMA + API        L2: UI CONNECTED
PRD written        ->   Backend exists      ->   Users can interact
Features defined        Schema deployed          CRUD works
Success criteria        API endpoints live       Workflows complete
Kill signal set         Integration tests        Manual QA passes

L3: TESTED              L4: COMMISSIONED
Automated tests    ->   Independent verification
E2E suite passes        Commissioner reads spec
Performance gates       Commissioner opens browser
Regression suite        Commissioner checks features
                        Pass / Fail + evidence

The Protocol

Commissioner reads the PRD (not the code)
Commissioner opens the live application
For each row in the Feature / Function / Outcome table:
- Can the feature be found?
- Does the function work as specified?
- Does the outcome match the success criteria?
Record Pass / Fail with evidence (screenshot, recording, measurement)
Update the PRD commissioning table
If all critical features Pass: capability is L4 Commissioned

The commissioner is never the builder. The builder knows what they intended. The commissioner checks what actually shipped.

The Loop

Read SPEC-MAP (what should pass, what tests exist)
  -> Check test coverage: any empty Test File cells = BLOCKER before L4
    -> Navigate to deployed URL
      -> Walk each feature row
        -> Verify pass/fail with evidence (screenshot, GIF, console, network)
          -> Update SPEC-MAP columns 5-6 (L-Level, Last Verified)
            -> Update commissioning dashboard with findings
              -> Gap between spec and reality drives next priority

The SPEC-MAP adds a step before browser verification: check that engineering has test coverage for every Story Contract row. A feature that works on the deployed site but has no automated test is L3 at best — one deploy away from invisible regression. The SPEC-MAP makes this gap visible before commissioning starts.

Evidence Gates

Each level transition requires specific evidence. The builder claims, the commissioner verifies.

Transition	Evidence Type	Minimum	Who Claims	Who Verifies
L0→L1	Schema matches spec	DB introspection output	Engineering	Engineering (schema is binary)
L1→L2	UI renders and connects	Screenshot/GIF of CRUD flow	Engineering	Commissioner (Dream Team)
L2→L3	Tests pass against spec	CI output, e2e suite green	Engineering	CI pipeline (automated)
L3→L4	Independent verification	Commissioner walkthrough + evidence link	Commissioner	Commissioner (different from builder)
Any→Broken	Reproduction evidence	Bug report with steps	Anyone	Engineering confirms

Commissioning is Dream Team's final responsibility. You defined what creates value (steps 1-3 in the pipeline). Engineering built it (step 4). Now you verify the delivery matches the spec. The gap between what was specified and what was built is the honest error signal — it feeds back into the next cycle.

Per-Feature Checklist

For each row in a PRD's commissioning table:

Navigate — Can you reach the feature from the expected entry point?
Happy path — Does the primary workflow complete successfully?
Output correct — Does the result match the PRD's stated outcome?
Error handling — Does a bad input produce a clear error, not a crash?
Intent verified — If agentic: agent action stayed within declared scope (constraints, budget, permissions)
Evidence captured — GIF or screenshot proving the above

Verification Channels

Each channel gets validated differently:

Channel	What to Verify
Web UI	Features work as specified in PRD
API routes	Endpoints return correct data, response shape + status codes
A2A protocol	Agent Card discoverable, Task Cards accepted, task lifecycle response
Console health	No errors, no warnings in critical paths
Agent intent	Agent action matches declared scope — constraints, budget, permissions

For which browser tool to use per channel, see the tool selection guide.

Flight Readiness

Before any capability ships to production, it must pass eight gates. Adapted from factory pre-flight inspection.

Gate	Criteria	Test	Applies To
G1: Config	Version locked, zero uncommitted changes	`git status` clean on deploy branch	All
G2: Types	Zero TypeScript errors, strict mode	`pnpm nx typecheck [app]`	All
G3: Security	Auth + rate limits + CSP configured	Action validation audit	All
G4: Tests	Pass rate above threshold, documented skips	`pnpm nx test [app]`	All
G5: Performance	P95 response time within budget	Latency measurement under load	All with UI
G6: Observability	Four Golden Signals monitored	Latency, Traffic, Errors, Saturation	Production apps
G7: AI Safety	Prompt injection mitigated, hallucination bounded	Validation layer audit	AI capabilities
G8: Ops Ready	Rollback tested, runbook exists	Deployment verification	Production apps

Golden Signals

Four signals for G6 observability:

Signal	Metric	Threshold
Latency	P95 response time	Under 3s API, under 10s AI
Traffic	Concurrent users	Over 50 supported
Errors	Error rate	Under 5%
Saturation	Function timeout	Under 80%

Phase to Level

How the venture algorithm maps to engineering maturity:

Algorithm Phase	Typical L-Level	What's Happening
SCAN-DISCOVER	--	No build. Exploring.
VALIDATE	L0	Spec written, scored, kill signals identified
MODEL-FINANCE	L0-L1	Business model selected, financial models built
STRATEGY	L1	Positioning defined, GTM planned
PITCH-SELL	L1-L2	Persuasion assets created, users can interact
MEASURE	L2+	Feedback loop operational, scorecard active

Verifiable Intent

Commissioning IS verifiable intent applied to software delivery. The delegation chain maps directly:

VI Layer	Commissioning	What It Proves
L1 Identity	PRD author	Who specified the capability
L2 Intent	PRD spec + success criteria	What was authorized to be built
L3 Action	Engineering build	What was actually shipped
Verification	Commissioner walkthrough	Did action match intent?

The builder (agent) acts within the PRD (intent). The commissioner (verifier) checks the delegation chain: spec matched build matched outcome. When agents build features, the same three-layer proof applies — L2 intent constraints become machine-verifiable acceptance criteria. A capability without an intent spec cannot reach L3.

Context

Verifiable Intent — The authorization proof protocol that commissioning implements
Commissioning — The principle: why independent verification matters, across domains
AI Browser Tools — Tool selection for browser-based commissioning
Commissioning Dashboard — Live status of every capability
Work Prioritisation — Scoring algorithm, rubrics, gates
Business Factory Requirements — The capability catalogue
PRDs — How to spec capabilities
Benchmark Standards — Trigger-based benchmark protocol
Flow Engineering — After L4, stories become maps that produce code artifacts
Cost of Quality — Enforcement tier metrics
Cost Escalation — The 10x multiplier
Trust — Commissioning builds structural trust, not claimed trust

Questions

When is a capability ready to ship — and how do you prove it without building it yourself?

At what maturity level does a capability start generating revenue — and is L4 even necessary for first customers?
Should flight readiness gates differ by capability type (platform vs product vs agent)?
What's the cost of skipping L3 (tested) and going straight from L2 (UI connected) to L4 (commissioned)?
How do you commission an AI capability when its outputs are distributions, not deterministic?

Maturity Levels​

Status Vocabulary​

The Process​

The Protocol​

The Loop​

Evidence Gates​

Per-Feature Checklist​

Verification Channels​

Flight Readiness​

Golden Signals​

Phase to Level​

Verifiable Intent​

Context​

Questions​