Engineering Flow
In the internet driven by intent, how do you turn your dreams into a valuable outcomes?
OUTCOME → VALUE STREAM → DEPENDENCIES → CAPABILITIES → A&ID
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
Contracts Processes Sequencing Readiness Orchestration
The same way factories get built — draw it first. P&IDs became steel and concrete. Flow maps become working systems. The drawing IS the engineering.
Flow of Intent
Five maps. Five questions. In sequence. Each produces inputs for the next.
| Map | Question | Produces |
|---|---|---|
| Outcome Map | What does success look like? | Domain contracts, success measures |
| Value Stream Map | Where's the waste? | Use cases, repositories, adapters |
| Dependency Map | What must happen first? | Composition, task ordering |
| Capability Map | What can we do? | Generators, skills, work charts |
| Agents & Instruments | How do agents orchestrate? | Agent configs, feedback loops |
4 Key Maps = WHAT to build
A&ID = HOW agents work together to build it
The Agent & Instrument Diagram extends P&ID discipline to AI and Crypto systems. Every agent, instrument, and feedback loop made visible before a line of code is written.
Bridge to Reality
Before encoding types, map the data. The Outcome Map answers "what does success look like?" — the Data Flow Map answers "how does data move to make that success real?"
| Question | Instrument | Why It Matters |
|---|---|---|
| What is the highest-volume workflow? | Data Flow Map | Hop count reveals the constraint — not data quality |
| Where does each piece of data live? | SSOT Audit (Data Flow Map §6) | Multiple sources of truth = guaranteed drift |
| What transforms data between steps? | 4-Verb Lifecycle (Data Flow Map §2) | Artifact touches are the automation targets |
| Is the data open and portable? | Flow State Assessment (Data Flow Map §5) | Locked data can't feed agents |
Fix the flow before building types. A type system built on top of broken data flow inherits the breakage.
Three Disciplines
Once the maps are drawn and the data flows cleanly, three disciplines execute the build in sequence:
| # | Step | What It Does | Key Driver |
|---|---|---|---|
| 1 | Align Intent | Desired (dream) vs possible (engineering). Convert the contract into data flow + types — entities, schema, meta. | Shared contract before code |
| 2 | Type-First Development | Encode the flow as types. Let the compiler enforce correctness through four layers. | Compiler is the methodology |
| 3 | Code That Lasts | Constant checks and balances. Maintainability is the key driver — bad code must be structurally harder to write than good code. | Structure over memory |
Logic-First Gate
Maps encode logic. Code executes logic. AI agents amplify logic. The quality of result at every tier depends on the quality of logic going in.
The gate: Before any map is drawn, the decision tree it encodes must be validated at small scale — real humans executing the process, producing documented outcomes, against named KPIs. If the logic is unproven, the map is speculative. Speculative maps produce speculative generators, which produce speculative behaviour — at volume, at speed.
| Tier | Good logic in | Unproven logic in |
|---|---|---|
| Generator | Correct code by construction | Incorrect code by construction — at scale |
| Template | Consistent phase execution | Consistently wrong phase execution |
| AI agent | Faithful, high-throughput execution | Faithful, high-throughput failure |
Test before encoding: A small-scale team executing the proposed decision tree for a defined period — with documented inputs, outputs, and KPIs — is not a delay. It is the most important step before encoding logic into any automated system. Encode before proving = scaled error.
The signal: If you cannot describe the decision tree in observable, falsifiable steps that a human can execute and measure, the logic is not ready to encode.
This is the precondition for Outcome Map accuracy — "What does success look like?" only has a real answer if someone has already seen it happen at small scale.
Maps to Execution
Maps don't produce documentation. They produce the inputs for plan templates and generators.
| Map | Plan Phase | Generator Input | Trophy Layer | What It Produces |
|---|---|---|---|---|
| Outcome Map | Explore | Domain contracts | L1 | Ports, DTOs, entities, acceptance criteria |
| Value Stream Map | Define Types | Schema definitions | L1 → L2 | Repository interfaces, test expectations |
| Dependency Map | Write Test Specs | Ordering constraints | L2 | Failing tests that define "done" |
| Capability Map | Build | Generator selection | L2 → L3 | Scaffolded code in correct layer order |
| A&ID | Orchestrate | Agent configs | L3 → L4 | Plan templates, feedback loops |
Map the flow → Encode as types → Generate test specs → Scaffold implementation → Validate outcomes
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
Exploration Contracts that Failing tests at Generators enforce Did outcomes match
produces the compiler correct trophy layer correct layer what exploration
contracts enforces (L1 schema, L2 wiring, order automatically predicted?
L3 only if browser
is the proof)
Each map iteration improves the generators. The Capability Map tracks which patterns are codified (generator exists) versus manual (hand-coded). When a manual pattern appears twice, it becomes a generator. When a generator exists, using it is mandatory.
Plan templates compose from multiple sources — a single feature plan might derive tasks from entity commissioning, UI component, and e2e testing templates simultaneously. Each template contributes its gates (TDD enforcement, CDD file limits, security triads, proof commands). The plan inherits all gates from all templates.
Failure Anatomy
plan-cli.ts accepted phase data from stdin. Missing phaseSlug. PostgreSQL caught it as 23502: NOT NULL violation. Debugging took 10+ minutes.
MAP Boundary validation designed? NO — skipped
|
v
TYPE Types used at boundary? NO — `as` cast
|
v
TEST Test for invalid input? NO — none existed
|
v
IMPLEMENT Code trusts stdin blindly? YES
|
v
ERROR Where did it surface? PostgreSQL (most expensive)
Three stages skipped. The error fell through to the most expensive layer — and the one where agents have the least signal to self-correct.
// Before: trusts stdin
const phases = JSON.parse(stdin) as Record<string, unknown>[];
await db.insert(planningPhase).values(phases.map((p) => ({ ...p, planId })));
// After: validates at boundary
const phases = phasesInputSchema.parse(JSON.parse(stdin));
This fixes one function. The structural fix prevents the class:
| Level | Fix | Mechanism | Scope |
|---|---|---|---|
| Instance | Add Zod to plan-cli.ts | Import schema, call .parse() | This function |
| Rule | boundary-validation.md in .claude/rules/ | Auto-loaded every session | Every developer, every session |
| Hook | post-edit detecting as casts on JSON.parse | Fires on every .ts edit | Every edit, zero effort |
Enforcement Hierarchy
Three levels of response to any failure:
| Level | Response | Scope |
|---|---|---|
| Instance | Fix the bug | One function, one file |
| Class | Prevent the category | Every file handling external data, every session |
| Structure | Engineer it away | Every edit, zero effort, zero memory required |
Most teams stop at level 1. The question after every incident: at what level did we fix it?
| Tier | Mechanism | Effort | Failure Mode |
|---|---|---|---|
| Generator | Code IS correct by construction | None | Cannot produce wrong pattern |
| Template | Phase ordering prevents skipping | Follow the template | Skip a phase |
| Hook | Auto-fires on edit | None | Developer ignores warning |
| Rule | Auto-loaded context | Read and follow | Developer skims |
| Skill | On-trigger procedure | Invoke the skill | Developer forgets to invoke |
| Expertise | Developer memory | Remember and apply | Developer forgets |
Push enforcement UP. A hook detecting JSON.parse(x) as at edit time prevents the entire class. A memory of "validate stdin" prevents one instance, if you remember.
Template Health
Templates sit at Tier 2 — they frame plan creation but don't finish plans. A healthy template reduces decisions. A broken template produces the same gap type repeatedly.
Three health signals:
| Signal | Healthy | Broken |
|---|---|---|
| Completion rate | Plans using this template reach end-jtbd-validation without rework | Plans stall or get restructured mid-build |
| Retrospective recurrence | Findings addressed — same gap type does not appear in next plan | Same gap type appears 3+ times across plans using this template |
| Generator coverage | Mechanical tasks in the template have generators; plan slots reserved for decisions | Template contains manual steps that a generator could produce |
The feedback loop:
Plan runs → Retrospective → INSIGHTS file → Template updated → Next plan benefits
Each template accumulates an INSIGHTS file from retrospectives. When a finding maps to a template phase (wrong ordering, missing gate, unclear framing), the template is updated — not just the INSIGHTS file. The INSIGHTS file is the evidence trail; the template is the actuator.
Kill signal: If a template produces the same gap type in 3+ consecutive plans, the template is broken. Fix or retire.
Framing principle: Templates provide framing, not finished plans. A template that tries to finish the plan produces cargo-cult phases where the agent fills slots mechanically instead of reasoning about the work.
Generator Output Gate
The Generator tier claims "code IS correct by construction." That claim requires proof. Generated code must pass the same gates as hand-written code — lint, type checks, pre-commit hooks. If generated code fails a hook, the generator template is broken. Fix the template, not the hook.
Five failure classes from production retros:
| Class | Symptom | Template Fix |
|---|---|---|
| Missing lint suppression | Pre-commit rejects generated file | Add eslint-disable header to template |
| Aggregation type cast | as cast on group-by result | Emit typed query result from generator |
| Non-FK UUID test value | String where UUID expected in test | Handle standalone UUIDs in test scaffolding |
| Barrel re-export style | export * rejected by lint | Use named re-exports in template |
| JSDoc runtime code | console.log in JSDoc example | Strip executable patterns from template docs |
Each failure looks like a code bug. Each is a template bug. The instance fix (edit the generated file) recurs on every scaffold. The class fix (edit the template) prevents recurrence permanently.
Validation rule: Run the generated output through the pre-commit pipeline before declaring the generator working.
INCIDENT
|
v
Fix the instance (necessary, not sufficient)
|
v
What CLASS of error? (specific → general)
|
v
Prevent the class (rule: advisory)
|
v
Can this be STRUCTURAL? (advisory → enforcement)
|
v
Engineer the structure (hook/generator: automatic)
|
v
CANNOT RECUR
Cost of Quality
The enforcement hierarchy describes six tiers. Cost tracking measures whether they work.
Every incident produces a cost annotation:
| Field | What It Records |
|---|---|
| Where caught | Which tier actually caught it (generator / template / hook / rule / skill / expertise) |
| Where it should have been caught | Which tier SHOULD have caught it |
| Time to resolve | Clock time from detection to fix merged |
| Layer | TypeScript / Zod / PostgreSQL / Production |
Three metrics compound from these annotations:
| Metric | What It Measures | Signal |
|---|---|---|
| Catch rate by tier | % of incidents caught at each enforcement level | Hooks catching most = healthy. Expertise catching most = fragile. |
| Escalation rate | % of incidents that fell past their intended tier | Rising = enforcement gaps. Falling = tiers are wired correctly. |
| Cost per miss | Time-to-resolve when an incident escapes its tier | Validates the 10x multiplier from cost escalation |
| Tier | What To Track | Healthy State |
|---|---|---|
| Generator | Incidents in generated code | Zero — if a generator produces bugs, fix the generator |
| Template | Phases skipped or reordered | Zero — template gates should prevent this |
| Hook | Hook fire count vs violations shipped | High fire count, zero violations in commit |
| Rule | Incidents in rule-covered areas | Low — rules without hooks are suggestions under load |
| Skill | Incidents in skill-covered areas where skill wasn't invoked | Decreasing — skill invocation should become habit |
| Expertise | Incidents with no structural prevention | Decreasing — every expertise-caught incident should produce a hook or generator |
The cost tracking loop: incident → annotate → identify tier gap → push enforcement up → measure whether that class recurs.
Retrospective Protocol
Cost of quality measures whether enforcement works. This section bridges measurement to structural prevention — turning engineering lessons into artifacts that prevent recurrence.
Five Gap Types
Every engineering failure maps to one of five gap types. Each type has a broken VVFL station and a target enforcement tier.
| Gap Type | Symptom | Broken Station | Target Tier | Structural Fix |
|---|---|---|---|---|
| Gate bypass | Required fields empty, bookends skipped | Standards (gauge reads zero) | Template | CLI validates required fields before plan creation |
| Template bloat | Mechanical tasks consume plan slots | Attention (wasted on boilerplate) | Generator | Generator produces boilerplate, plan tracks decisions |
| Sequence violation | E2E before UI, retrofitted testids | Systems (order dependency ignored) | Generator | Generator pre-populates testids at scaffold time |
| Interface drift | pgEnum 18, TS union 21, seed 16 | Standards (single source violated) | Generator | Single-source type definition generates all variants |
| Demand absence | Work started without Tight Five or prdRef | Priorities (no demand validation) | Rule | prdRef required at plan creation |
| Logic absence | Decision tree encoded without small-scale human validation | Standards (logic gate skipped) | DoR gate | Evidence of validated decision tree required before project starts |
Retrospective Template
Every retrospective produces four sections. No variation.
| Section | What It Contains |
|---|---|
| What happened | Expected vs actual, with file paths and evidence |
| Which gap type | One of five (gate bypass, template bloat, sequence violation, interface drift, demand absence) |
| Enforcement response | Instance fix (this bug) + class prevention (which tier absorbs it) |
| Artifacts updated | Exact file paths: template, hook, generator, or rule that changed |
Routing Logic
| Finding Type | Target Artifact | Owner |
|---|---|---|
| Missing field | Plan template (template.json) | Template maintainer |
| Boilerplate task | Generator | Platform engineer |
| Wrong ordering | Generator + template phase gates | Platform engineer |
| Type mismatch | Single-source type definition | Schema owner |
| No demand signal | Context rule + plan creation guard | Rule maintainer |
Single Source Rule
Every enumerable set has ONE definition. Two files defining the same set = guaranteed drift.
- Enum in schema → generated into TypeScript union, seed file, and validation
- If you find two files defining the same set, that IS the bug — fix the duplication before fixing the symptom
VVFL Connection
The five gap types map to broken stations in the VVFL: Standards (gauge), Attention (focus), Systems (sequence), Priorities (demand). The retrospective template maps to the Reflect station in the 9-station model — the controller that converts measurement into structural change. Cost of quality (above) is the sensor. This protocol is the actuator.
Two Dimensions
Every map has two layers:
| Layer | What It Captures |
|---|---|
| Dream | Future state — what we're building |
| Engineering | Current state — what exists |
| Gap | What we must build to close the distance |
Fill maps with REALITY (evidence, not hopes). Keep them FRESH (stale maps are worse than no maps).
PLANS ARE WORTHLESS, PLANNING IS ESSENTIAL.
GOOD PLANNING ALWAYS STARTS WITH MAPPING REALITY.
Picture the dream. Map reality. Close the gap.
Context
- Align Intent — Step 1: desired vs possible. Spec contract, data flow, entities, schema, meta
- Type-First Development — Step 2: encode flow as types; compiler is the methodology
- Code That Lasts — Step 3: checks and balances; maintainability as the key driver
- Data Flow Map — Before types, map the flow: hop count, 4-verb lifecycle, SSOT audit
- Agent & Instrument Diagram — The capstone: how agents and instruments orchestrate the flow
- VVFL — The operating system this flow implements — each engineering station maps to a loop station
- Pictures — The tools that make thinking visible
- Testing Strategy — Proving each layer honors the contracts
- Create PRD Stories — F/F/O tables that feed the Outcome Map
- Products — Great products deliver great outcomes
- Process Optimisation — Improve the flow
- Control System — Enforcement hierarchy maps to PID mechanics
- Predictions — Each outcome in the Outcome Map becomes a prediction to track
- Agency — Cost of Quality tiers map to agency: expertise is character, generators are capability
- Outer-Loop Validation — After build: instruments read production reality
- Hexagonal Architecture — The layer model architecture review validates against
- Commissioning Protocol — L0-L4 maturity levels
- Priorities — Active PRDs and build queue
Links
- Flow Engineering (Steve Pereira) — Origin methodology
- Flow Collective — Community of practice
Questions
If the enforcement hierarchy claims generators produce correct code by construction, what validates that claim — and who notices when it stops being true?
- When generated code passes runtime tests but fails pre-commit hooks, which enforcement tier actually caught the bug?
- What is the cost multiplier when a template bug ships to 73 repositories before anyone runs lint-staged on the output?
- If the Generator Output Gate had existed before the Data Foundation plan, which of the six failure classes would it have prevented?
- When should architecture review block work versus flag and proceed?
- If value stories are predictions, what happens when the prediction is wrong but the code is correct?