Commissioning

Does the output match the spec — or does it just look like it does?

The builder knows what they intended. The commissioner checks what actually shipped. These are never the same person.

The Principle

Syntactically correct is not functionally correct. An instrument that looks right in the code, produces the right-shaped output, and runs without errors can still produce the wrong answer. The only way to know is to test the output against the spec — not review the code.

Feels Like	Actually Is
Algorithm runs, produces ranked table	Blockers sort wrong — soft bonus overwhelmed by kill-date
Hooks are wired, settings reference them	`uv` not installed, masking Python SyntaxError — hooks never executed
95% test coverage	Mutation testing reveals 40% of tests assert nothing meaningful
Instrument calibrated last year	As-found drift exceeded tolerance 6 months ago

The theft is invisible. You don't feel robbed. You feel "busy."

Pre-Flight

Commissioning starts before building, not after. The five flow engineering maps are the pre-flight checklist:

Map	Commissioning Question	Level
Outcome Map	What does success look like? Binary measures defined?	L0
Value Stream Map	How does value flow? Where does time die?	L1
Dependency Map	What are the hard constraints? Are they enforced or assumed?	L2
Capability Map	Can we actually execute? Where are the gaps?	L3
A&ID	Which agents, which instruments, which feedback loops?	L4

Skip the pre-flight and you discover failures in production. The maps make invisible assumptions visible before you build.

Post-Flight

After building, independent verification against the spec. Five maturity levels:

Level	Evidence	Who Verifies
L0 Spec only	PRD written, features defined	Author
L1 Schema exists	Backend deployed, API responds	Builder
L2 UI connected	Users can interact, workflows complete	Builder
L3 Tested	Automated tests pass, edge cases covered	Builder
L4 Commissioned	Independent verification against PRD	Commissioner

L4 is the only level that proves the output matches the spec. Everything below L4 is the builder checking their own work.

Three Domains

The principle is the same across process engineering, software, and measurement science. Different substrate, same pattern: verify the output, not the intention.

Process Engineering

In factories, the team that builds a system is never the team that commissions it.

Technique	What It Proves	If You Skip It
P&ID Walkdown	Drawing matches installed plant	Operators work from wrong maps
Loop Check (ISA-62382)	Signal integrity end-to-end	Reversed control loops at startup
FAT / SAT	Vendor product works + integrated system works	Ground loop noise causes spurious trips on site
PSSR	Ready state before hazardous energy enters	BP Texas City 2005: inadequate PSSR, 15 killed

Software

Tests prove presence of execution, not correctness of behaviour.

Technique	What It Proves	If You Skip It
Mutation Testing	Tests would catch real bugs, not just execute lines	95% coverage hides 60% undetectable defects
Property-Based Testing	Invariants hold for all inputs, not just examples	Edge cases you never imagined go untested
Chaos Engineering	Resilience is proven, not assumed	Production discovers failure modes at 2am
Contract Testing	Interface agreement is current across services	Backend renames field, frontend mock still passes, production breaks

Measurement Science

A measurement without stated uncertainty is not a measurement — it's a guess.

Technique	What It Proves	If You Skip It
Traceability Chain (VIM/NIST)	Result reaches primary standard through documented chain	Mars Climate Orbiter: unit mismatch, $327M lost
Uncertainty Budget (GUM)	All error sources accounted for	Instrument claims ±1% used to verify ±0.5% spec
Calibration Intervals	Drift stays within bounds between cycles	FDA audit invalidates 14 months of pharmaceutical batches
ISO 17025 Accreditation	Traceability chain independently verified	Circular calibration — lab calibrates against itself

The A&ID Connection

The Agent & Instrument Diagram makes commissioning visual:

A&ID Element	Role	Commissioning Check
Agent (Yang)	Applies force of intent — transforms and distributes value	Is the agent building to spec?
Instrument (Yin)	Verifies, measures, rewards — closes the feedback loop	Is the instrument calibrated?
Feedback Loop	Signal flows from instrument back to agent	Does the loop actually close?

An uncalibrated instrument in a feedback loop is worse than no instrument — it provides false confidence. The agent adjusts based on wrong readings. The system drifts while everyone believes it's on course.

Commissioning verifies the instruments. Not the agents, not the process — the instruments. Because if the measurement is wrong, everything downstream is wrong.

The Protocol

For every instrument in the system:

Step	Question	Evidence
1. Spec	What should this instrument produce?	Written spec with binary success criteria
2. Input	What does it read?	Verified input sources
3. Output	What does it actually produce?	Tested against known inputs
4. Calibration	Does output match spec across full range?	Edge cases, not just happy path
5. Drift	How fast does accuracy degrade?	As-found/as-left records over time
6. Independence	Who verifies — someone other than the builder?	Named commissioner

Context

A&ID — Agents and instruments made visible
Flow Engineering — The five maps that form the pre-flight
Benchmark Standards — Pass/Warn/Fail triggers for instruments
Standards — Where proven verification protocols graduate
Time and Energy — The non-renewable capital commissioning protects
Capital — Credibility requires verified performance
Commissioning Dashboard — The live application of this principle

The Principle​

Pre-Flight​

Post-Flight​

Three Domains​

Process Engineering​

Software​

Measurement Science​

The A&ID Connection​

The Protocol​

Context​