Commissioning Protocol
When is a capability ready to ship — and how do you prove it?
The team that builds a system is never the team that commissions it. The builder knows what they intended. The commissioner checks what actually shipped.
Maturity Levels
Every Mycelium capability is scored on a 5-level maturity scale:
| Level | Meaning | Evidence Required |
|---|---|---|
| L0 | Spec only | PRD written, no build |
| L1 | Schema + API | Backend exists, no interface |
| L2 | UI connected | Users can interact |
| L3 | Tested | Automated verification passes |
| L4 | Commissioned | Independent verification against PRD criteria |
The Process
How a capability moves from L0 to L4:
L0: SPEC ONLY L1: SCHEMA + API L2: UI CONNECTED
PRD written -> Backend exists -> Users can interact
Features defined Schema deployed CRUD works
Success criteria API endpoints live Workflows complete
Kill signal set Integration tests Manual QA passes
L3: TESTED L4: COMMISSIONED
Automated tests -> Independent verification
E2E suite passes Commissioner reads spec
Performance gates Commissioner opens browser
Regression suite Commissioner checks features
Pass / Fail + evidence
The Protocol
- Commissioner reads the PRD (not the code)
- Commissioner opens the live application
- For each row in the Feature / Function / Outcome table:
- Can the feature be found?
- Does the function work as specified?
- Does the outcome match the success criteria?
- Record Pass / Fail with evidence (screenshot, recording, measurement)
- Update the PRD commissioning table
- If all critical features Pass: capability is L4 Commissioned
The commissioner is never the builder. The builder knows what they intended. The commissioner checks what actually shipped.
The Loop
Read PRD commissioning table (what should pass)
-> Navigate to deployed URL
-> Walk each feature row
-> Verify pass/fail with evidence (screenshot, GIF, console, network)
-> Update commissioning dashboard with findings
-> Gap between spec and reality drives next priority
Per-Feature Checklist
For each row in a PRD's commissioning table:
- Navigate — Can you reach the feature from the expected entry point?
- Happy path — Does the primary workflow complete successfully?
- Output correct — Does the result match the PRD's stated outcome?
- Error handling — Does a bad input produce a clear error, not a crash?
- Evidence captured — GIF or screenshot proving the above
Verification Channels
Each channel gets validated differently:
| Channel | What to Verify |
|---|---|
| Web UI | Features work as specified in PRD |
| API routes | Endpoints return correct data, response shape + status codes |
| A2A protocol | Agent Card discoverable, Task Cards accepted, task lifecycle response |
| Console health | No errors, no warnings in critical paths |
For which browser tool to use per channel, see the tool selection guide.
Flight Readiness
Before any capability ships to production, it must pass eight gates. Adapted from factory pre-flight inspection.
| Gate | Criteria | Test | Applies To |
|---|---|---|---|
| G1: Config | Version locked, zero uncommitted changes | git status clean on deploy branch | All |
| G2: Types | Zero TypeScript errors, strict mode | pnpm nx typecheck [app] | All |
| G3: Security | Auth + rate limits + CSP configured | Action validation audit | All |
| G4: Tests | Pass rate above threshold, documented skips | pnpm nx test [app] | All |
| G5: Performance | P95 response time within budget | Latency measurement under load | All with UI |
| G6: Observability | Four Golden Signals monitored | Latency, Traffic, Errors, Saturation | Production apps |
| G7: AI Safety | Prompt injection mitigated, hallucination bounded | Validation layer audit | AI capabilities |
| G8: Ops Ready | Rollback tested, runbook exists | Deployment verification | Production apps |
Golden Signals
Four signals for G6 observability:
| Signal | Metric | Threshold |
|---|---|---|
| Latency | P95 response time | Under 3s API, under 10s AI |
| Traffic | Concurrent users | Over 50 supported |
| Errors | Error rate | Under 5% |
| Saturation | Function timeout | Under 80% |
Phase to Level
How the venture algorithm maps to engineering maturity:
| Algorithm Phase | Typical L-Level | What's Happening |
|---|---|---|
| SCAN-DISCOVER | -- | No build. Exploring. |
| VALIDATE | L0 | Spec written, scored, kill signals identified |
| MODEL-FINANCE | L0-L1 | Business model selected, financial models built |
| STRATEGY | L1 | Positioning defined, GTM planned |
| PITCH-SELL | L1-L2 | Persuasion assets created, users can interact |
| MEASURE | L2+ | Feedback loop operational, scorecard active |
Context
- Commissioning — The principle: why independent verification matters, across domains
- AI Browser Tools — Tool selection for browser-based commissioning
- Commissioning Dashboard — Live status of every capability
- Work Prioritisation — Scoring algorithm, rubrics, gates
- Phygital Mycelium — The capability catalogue
- PRDs — How to spec capabilities
- Benchmark Standards — Trigger-based benchmark protocol
- Flow Engineering — Maps that produce code artifacts
- Cost of Quality — Enforcement tier metrics
- Cost Escalation — The 10x multiplier
Questions
When is a capability ready to ship — and how do you prove it without building it yourself?
- At what maturity level does a capability start generating revenue — and is L4 even necessary for first customers?
- Should flight readiness gates differ by capability type (platform vs product vs agent)?
- What's the cost of skipping L3 (tested) and going straight from L2 (UI connected) to L4 (commissioned)?
- How do you commission an AI capability when its outputs are distributions, not deterministic?