Commissioning
Does the output match the spec — or does it just look like it does?
The builder knows what they intended. The commissioner checks what actually shipped. These are never the same person.
The Principle
Syntactically correct is not functionally correct. An instrument that looks right in the code, produces the right-shaped output, and runs without errors can still produce the wrong answer. The only way to know is to test the output against the spec — not review the code.
| Feels Like | Actually Is |
|---|---|
| Algorithm runs, produces ranked table | Blockers sort wrong — soft bonus overwhelmed by kill-date |
| Hooks are wired, settings reference them | uv not installed, masking Python SyntaxError — hooks never executed |
| 95% test coverage | Mutation testing reveals 40% of tests assert nothing meaningful |
| Instrument calibrated last year | As-found drift exceeded tolerance 6 months ago |
The theft is invisible. You don't feel robbed. You feel "busy."
Pre-Flight
Commissioning starts before building, not after. The five flow engineering maps are the pre-flight checklist:
| Map | Commissioning Question | Level |
|---|---|---|
| Outcome Map | What does success look like? Binary measures defined? | L0 |
| Value Stream Map | How does value flow? Where does time die? | L1 |
| Dependency Map | What are the hard constraints? Are they enforced or assumed? | L2 |
| Capability Map | Can we actually execute? Where are the gaps? | L3 |
| A&ID | Which agents, which instruments, which feedback loops? | L4 |
Skip the pre-flight and you discover failures in production. The maps make invisible assumptions visible before you build.
Post-Flight
After building, independent verification against the spec. Five maturity levels:
| Level | Evidence | Who Verifies |
|---|---|---|
| L0 Spec only | PRD written, features defined | Author |
| L1 Schema exists | Backend deployed, API responds | Builder |
| L2 UI connected | Users can interact, workflows complete | Builder |
| L3 Tested | Automated tests pass, edge cases covered | Builder |
| L4 Commissioned | Independent verification against PRD | Commissioner |
L4 is the only level that proves the output matches the spec. Everything below L4 is the builder checking their own work.
Three Domains
The principle is the same across process engineering, software, and measurement science. Different substrate, same pattern: verify the output, not the intention.
Process Engineering
In factories, the team that builds a system is never the team that commissions it.
| Technique | What It Proves | If You Skip It |
|---|---|---|
| P&ID Walkdown | Drawing matches installed plant | Operators work from wrong maps |
| Loop Check (ISA-62382) | Signal integrity end-to-end | Reversed control loops at startup |
| FAT / SAT | Vendor product works + integrated system works | Ground loop noise causes spurious trips on site |
| PSSR | Ready state before hazardous energy enters | BP Texas City 2005: inadequate PSSR, 15 killed |
Software
Tests prove presence of execution, not correctness of behaviour.
| Technique | What It Proves | If You Skip It |
|---|---|---|
| Mutation Testing | Tests would catch real bugs, not just execute lines | 95% coverage hides 60% undetectable defects |
| Property-Based Testing | Invariants hold for all inputs, not just examples | Edge cases you never imagined go untested |
| Chaos Engineering | Resilience is proven, not assumed | Production discovers failure modes at 2am |
| Contract Testing | Interface agreement is current across services | Backend renames field, frontend mock still passes, production breaks |
Measurement Science
A measurement without stated uncertainty is not a measurement — it's a guess.
| Technique | What It Proves | If You Skip It |
|---|---|---|
| Traceability Chain (VIM/NIST) | Result reaches primary standard through documented chain | Mars Climate Orbiter: unit mismatch, $327M lost |
| Uncertainty Budget (GUM) | All error sources accounted for | Instrument claims ±1% used to verify ±0.5% spec |
| Calibration Intervals | Drift stays within bounds between cycles | FDA audit invalidates 14 months of pharmaceutical batches |
| ISO 17025 Accreditation | Traceability chain independently verified | Circular calibration — lab calibrates against itself |
The A&ID Connection
The Agent & Instrument Diagram makes commissioning visual:
| A&ID Element | Role | Commissioning Check |
|---|---|---|
| Agent (Yang) | Applies force of intent — transforms and distributes value | Is the agent building to spec? |
| Instrument (Yin) | Verifies, measures, rewards — closes the feedback loop | Is the instrument calibrated? |
| Feedback Loop | Signal flows from instrument back to agent | Does the loop actually close? |
An uncalibrated instrument in a feedback loop is worse than no instrument — it provides false confidence. The agent adjusts based on wrong readings. The system drifts while everyone believes it's on course.
Commissioning verifies the instruments. Not the agents, not the process — the instruments. Because if the measurement is wrong, everything downstream is wrong.
The Protocol
For every instrument in the system:
| Step | Question | Evidence |
|---|---|---|
| 1. Spec | What should this instrument produce? | Written spec with binary success criteria |
| 2. Input | What does it read? | Verified input sources |
| 3. Output | What does it actually produce? | Tested against known inputs |
| 4. Calibration | Does output match spec across full range? | Edge cases, not just happy path |
| 5. Drift | How fast does accuracy degrade? | As-found/as-left records over time |
| 6. Independence | Who verifies — someone other than the builder? | Named commissioner |
Context
- A&ID — Agents and instruments made visible
- Flow Engineering — The five maps that form the pre-flight
- Benchmark Standards — Pass/Warn/Fail triggers for instruments
- Standards — Where proven verification protocols graduate
- Time and Energy — The non-renewable capital commissioning protects
- Capital — Credibility requires verified performance
- Commissioning Dashboard — The live application of this principle