Skip to main content

Commissioning

Does the output match the spec — or does it just look like it does?

The builder knows what they intended. The commissioner checks what actually shipped. These are never the same person.

The Principle

Syntactically correct is not functionally correct. An instrument that looks right in the code, produces the right-shaped output, and runs without errors can still produce the wrong answer. The only way to know is to test the output against the spec — not review the code.

Feels LikeActually Is
Algorithm runs, produces ranked tableBlockers sort wrong — soft bonus overwhelmed by kill-date
Hooks are wired, settings reference themuv not installed, masking Python SyntaxError — hooks never executed
95% test coverageMutation testing reveals 40% of tests assert nothing meaningful
Instrument calibrated last yearAs-found drift exceeded tolerance 6 months ago

The theft is invisible. You don't feel robbed. You feel "busy."

Pre-Flight

Commissioning starts before building, not after. The five flow engineering maps are the pre-flight checklist:

MapCommissioning QuestionLevel
Outcome MapWhat does success look like? Binary measures defined?L0
Value Stream MapHow does value flow? Where does time die?L1
Dependency MapWhat are the hard constraints? Are they enforced or assumed?L2
Capability MapCan we actually execute? Where are the gaps?L3
A&IDWhich agents, which instruments, which feedback loops?L4

Skip the pre-flight and you discover failures in production. The maps make invisible assumptions visible before you build.

Post-Flight

After building, independent verification against the spec. Five maturity levels:

LevelEvidenceWho Verifies
L0 Spec onlyPRD written, features definedAuthor
L1 Schema existsBackend deployed, API respondsBuilder
L2 UI connectedUsers can interact, workflows completeBuilder
L3 TestedAutomated tests pass, edge cases coveredBuilder
L4 CommissionedIndependent verification against PRDCommissioner

L4 is the only level that proves the output matches the spec. Everything below L4 is the builder checking their own work.

Three Domains

The principle is the same across process engineering, software, and measurement science. Different substrate, same pattern: verify the output, not the intention.

Process Engineering

In factories, the team that builds a system is never the team that commissions it.

TechniqueWhat It ProvesIf You Skip It
P&ID WalkdownDrawing matches installed plantOperators work from wrong maps
Loop Check (ISA-62382)Signal integrity end-to-endReversed control loops at startup
FAT / SATVendor product works + integrated system worksGround loop noise causes spurious trips on site
PSSRReady state before hazardous energy entersBP Texas City 2005: inadequate PSSR, 15 killed

Software

Tests prove presence of execution, not correctness of behaviour.

TechniqueWhat It ProvesIf You Skip It
Mutation TestingTests would catch real bugs, not just execute lines95% coverage hides 60% undetectable defects
Property-Based TestingInvariants hold for all inputs, not just examplesEdge cases you never imagined go untested
Chaos EngineeringResilience is proven, not assumedProduction discovers failure modes at 2am
Contract TestingInterface agreement is current across servicesBackend renames field, frontend mock still passes, production breaks

Measurement Science

A measurement without stated uncertainty is not a measurement — it's a guess.

TechniqueWhat It ProvesIf You Skip It
Traceability Chain (VIM/NIST)Result reaches primary standard through documented chainMars Climate Orbiter: unit mismatch, $327M lost
Uncertainty Budget (GUM)All error sources accounted forInstrument claims ±1% used to verify ±0.5% spec
Calibration IntervalsDrift stays within bounds between cyclesFDA audit invalidates 14 months of pharmaceutical batches
ISO 17025 AccreditationTraceability chain independently verifiedCircular calibration — lab calibrates against itself

The A&ID Connection

The Agent & Instrument Diagram makes commissioning visual:

A&ID ElementRoleCommissioning Check
Agent (Yang)Applies force of intent — transforms and distributes valueIs the agent building to spec?
Instrument (Yin)Verifies, measures, rewards — closes the feedback loopIs the instrument calibrated?
Feedback LoopSignal flows from instrument back to agentDoes the loop actually close?

An uncalibrated instrument in a feedback loop is worse than no instrument — it provides false confidence. The agent adjusts based on wrong readings. The system drifts while everyone believes it's on course.

Commissioning verifies the instruments. Not the agents, not the process — the instruments. Because if the measurement is wrong, everything downstream is wrong.

The Protocol

For every instrument in the system:

StepQuestionEvidence
1. SpecWhat should this instrument produce?Written spec with binary success criteria
2. InputWhat does it read?Verified input sources
3. OutputWhat does it actually produce?Tested against known inputs
4. CalibrationDoes output match spec across full range?Edge cases, not just happy path
5. DriftHow fast does accuracy degrade?As-found/as-left records over time
6. IndependenceWho verifies — someone other than the builder?Named commissioner

Context