Development Journeys

How does software go from someone's pain to proven value?

PAIN → DEMAND → SPEC → RANK → MAPS → TYPES → CODE → COMMISSION
 │       │        │      │       │       │       │        │
 │       │        │      │       │       │       │        └─ Independent verification (L0-L4)
 │       │        │      │       │       │       └────────── Type-first implementation
 │       │        │      │       │       └────────────────── Domain contracts from maps
 │       │        │      │       └────────────────────────── Flow engineering: stories → maps
 │       │        │      └────────────────────────────────── 5P scoring, sorted build order
 │       │        └───────────────────────────────────────── Intent + Story + Build contracts
 │       └────────────────────────────────────────────────── Evidence card: who hurts, how much
 └────────────────────────────────────────────────────────── Observed friction, not assumed need

The pipeline has one rule: each step's output is the next step's input. Skip a step and the chain breaks. The Dream Team owns both ends — defining value (steps 1-3) and verifying delivery (step 8). Engineering owns the build (steps 5-7). Step 4 is the handoff — and it includes an architecture fitness gate that can bounce a spec back before building starts.

Where Are You?

You have...	You're at...	Next step
A pain but no evidence	PAIN	Validate Demand
Evidence but no spec	DEMAND → SPEC	Create PRD Stories
A spec but no priority	SPEC → RANK	Prioritize PRDs
A ranked PRD but no code	RANK → MAPS	Flow Engineering
Code but no verification	CODE → COMMISSION	Validate Outcomes
Verification but no adoption	COMMISSION → MARKET	Credibility — Loop 3: external validation

Two Flows

Two views of the same system. Both are necessary. Neither is sufficient alone.

Flow	Direction	Question	Artifacts
User flow	Outside-in	What progress is someone trying to make?	Pain → Story Contract → Screen Contracts
Type flow	Inside-out	What contracts must the system honour?	Domain types → Infrastructure → Application → Presentation

User flow starts with a person's pain. Type flow starts with the domain's truth. The Story Contract bridges them: each story row names a user's pain AND the data source, field, and threshold that proves it's fixed. Stories become Outcome Maps. Outcome Maps become domain types. Types become code.

Two Loops

Every verification happens in one of two loops. Know which one you're in.

Loop	Speed	Cost	Catches	Tools
Inner	Fast (seconds)	Cheap	Contract violations, type errors, logic bugs	Compiler, unit tests, integration tests
Outer	Slow (minutes)	Expensive	Reality gaps, UX failures, production drift	E2E tests, browser verification, monitoring

The inner loop runs every keystroke. The outer loop runs before shipping. Type-first development maximises inner-loop catches — a type error caught at compile time costs nothing. A type error caught in production costs everything.

Outer-loop validation reads reality through instruments: browser tools, performance gauges, error dashboards. The builder can't commission their own work — the commissioner reads the same instruments independently.

Traceability

One requirement traced through every stage. The chain must be unbroken.

Stage	Artifact	Example
Pain	Observed friction	"Operator spends 30min manually auditing tool stack per category"
Demand	Evidence card	Interview transcript, frequency: weekly, 12 operators affected
Story	Story Contract row	S1: WHEN operator queries verdict THEN API returns in ≤2s with ≤5% divergence. Dimension: correctness. Baseline: 30min manual.
Score	5P frontmatter	Pain: 4, Demand: 3, Edge: 4, Trend: 3, Conversion: 3 → composite 432
Map	Outcome Map node	`queryVerdict()` → VerdictResult with coverage percentage
Type	Domain contract	`type VerdictResult = { verdict: string; coverage: number; sources: Source[] }`
Test	Test file	`apps/crm/tests/story-s1-verdict.spec.ts` — RED before implementation
Code	Implementation	`queryVerdict()` function satisfying the type contract
Commission	L4 evidence	Commissioner screenshot: verdict returned in 1.8s, 97% match

If you can't trace a requirement from pain to commission, the chain is broken. Find the break.

The SPEC-MAP

The traceability table above shows the chain conceptually. The SPEC-MAP makes it concrete — one row per Story Contract row, both sides writing to it. Dream fills the story. Engineering fills the test. Dream fills the commissioning result. Any empty cell is a broken link in the chain.

Without the SPEC-MAP, three gaps hide:

Feature works, no test — Commissioning scores L4, CI has no regression guard. Next deploy breaks it silently
Engineering changes behavior — Screen Contract says "skeleton loads" but the component renders immediately. Spec describes yesterday's product
Story is untestable — Engineering skips it. Nobody notices until the feature regresses and nobody knows the expected behavior

Two VVFL Loops

The pipeline runs two VVFL loops. They share an error signal but measure different things.

DREAM TEAM LOOP                              ENGINEERING LOOP
─────────────────                            ─────────────────
Setpoint: "What value should exist?"         Setpoint: "What code should exist?"
Gauge:    5P scores, commissioning evidence  Gauge:    Types, tests, enforcement tiers
Control:  Return signals → evolve specs      Control:  Retrospectives → evolve templates

     ┌─── PAIN ──→ DEMAND ──→ SPEC ──→ RANK ───┐
     │                                          │
     │              ERROR SIGNAL                ▼
     │         ◄──────────────────       MAPS → TYPES → CODE
     │     (predicted vs actual)                │
     │                                          │
     └──── COMMISSION ◄── VERIFY ◄── DEPLOY ◄──┘

Dimension	Dream Team Loop	Engineering Loop
Setpoint	Story Contract rows — predicted thresholds	Domain types — compiler-enforced contracts
Gauge	Commissioning evidence (L0-L4)	Cost of quality metrics
Controller	Return signals → evolve Story Contract	Retrospective protocol → evolve templates
Actuator	Updated PRD, rescored priorities	Updated generator, hook, or rule
Virtuous when	Spec gap signal improves future specs	Each bug prevents its class forever
Broken when	Return signals don't flow back — specs never evolve	Fixes stay at instance level — same class recurs

The Dream Team loop is slow (weeks) and measures value — did we spec the right thing? The Engineering loop is fast (hours) and measures correctness — did we build the thing right? The error signal between them — predicted vs actual outcomes at commissioning — is where the loops connect.

The pipeline's job is not to slow you down. It compresses the gap between insight and action to zero. Each cycle sharpens the questions, raises the gauge, deepens the controller — so when the signal arrives, you TAKE the decision.

The danger: Each loop can appear healthy in isolation. Specs keep getting written (Dream looks healthy). Code keeps shipping (Engineering looks healthy). But if commissioning evidence never flows back to evolve the specs, the Dream loop runs open. And if retrospective findings never inform the next PRD's Story Contract, the Engineering loop's lessons stay local.

Three Credibility Loops

The two VVFL loops above are both internal. They answer "did we build the right thing correctly?" That's necessary but not sufficient. Credibility requires three loops, each harder to fake than the last.

LOOP 1: INNER       "Does it work?"           Tests pass, benchmarks met
    │
    ▼
LOOP 2: STORY       "Does our story match?"   Predictions align with results
    │
    ▼
LOOP 3: MARKET      "Do others agree?"        External validation with behavior

Loop	Question	Evidence	Commissioning	Conviction
Inner	Does it pass our own standards?	Types compile, tests green, benchmarks met	L1-L3	LOW — "it works"
Story	Do our claims match our results?	Predictions scored, kill criteria honoured, receipts filed	L3 with scored predictions	MEDIUM — "it matters"
Market	Do others validate with their behaviour?	Revenue, adoption, referrals, independent verification	L4	HIGH — "others agree"

Loop 1 is the engineering loop above — compiler, tests, commissioning evidence. You control it entirely.

Loop 2 is the bridge. Every PRD makes predictions: "this feature reduces X by Y." Every venture states kill criteria. The credibility score is correct predictions divided by total predictions, weighted by conviction. This is where the Dream Team loop becomes honest — not "did we build it" but "did our story about why it matters hold up?"

Loop 3 is what makes L4 real. Someone outside the system validates with their attention, their money, or their referral. No amount of internal testing gets you there. This is the software promise fulfilled: coordination with minimal need for trust — because the evidence is verifiable, not narrated.

The traceability chain extended:

Stage	Loop	Artifact
Pain observed	—	Friction documented
Spec written	—	Story Contract with predictions
Code shipped	1	Tests green, benchmarks pass
Prediction scored	2	"We said 2s response — actual is 1.8s"
Kill criteria checked	2	We said kill if fewer than 10 users/month — actual is 47
External adoption	3	Someone chose this over their alternative

Most teams stop at Loop 1. They build, test, ship — and call that credibility. But the credibility page is clear: capability (Loop 1) is expected. Integrity — doing what you said, measured across time (Loop 2) — compounds. And the graph — others vouching with their behaviour (Loop 3) — is what makes the system trustworthy without requiring trust.

Market credibility is the greatest force. Loop 1 and Loop 2 are internal — you control the standards and you score your own predictions. Loop 3 is external — someone chose your product over their alternative, paid with attention or money, and came back. No amount of passing tests or scoring predictions substitutes for that signal. The SPEC-MAP tightens Loops 1 and 2 so that Loop 3 evidence — when it arrives — lands on a foundation that doesn't crack.

Dig Deeper

Jobs To Be Done — Discover demand, spec stories, rank priorities. The user-flow pipeline (steps 1-3)
Eng Dev Workflow — Flow engineering, type-first development, outer-loop validation. The type-flow pipeline (steps 5-7)
Validate Outcomes — Independent commissioning: did the build match the spec? (step 8)

Context

Flow Engineering — Stories become maps, maps become types, types become code
Type-First Development — Domain contracts pull implementation through the inner loop
Outer-Loop Validation — Instruments that read production reality
Feature Matrix — Live commissioning status for every capability
PRD Handoff Protocol — The interface between Dream Team and Engineering
VVFL Evolution — The feedback loop model behind the pipeline
Retrospective Protocol — Engineering loop: five gap types, enforcement hierarchy
Process Optimisation — Pit of success patterns, improvement loop
Credibility — Three layers: identity, capability, integrity. The prediction ledger that scores Loop 2
Software — The promise: coordination with minimal need for trust. Loop 3 is where that promise gets tested
Protocols — SPEC-MAP and PRD Handoff ARE inter-team coordination protocols
Agency — Intent Contract governs agent autonomy; commissioning proves agency was exercised within scope
Predictions — Integral calibration IS prediction ledger updates; 5P scores ARE predictions
Navigation — The traceability chain maps onto Value (what to build), Belief (why it matters), Control (how to verify)
Verifiable Intent — Commissioning IS verifiable intent applied to software delivery

Questions

Where in the pipeline does the signal from the original pain get lost — and what does the artifact look like at the break point?

If the inner loop catches 90% of defects, what's the remaining 10% that only the outer loop reveals — and is that acceptable risk?
When a Story Contract row can't be traced to a domain type, is the story wrong or the type model incomplete?
When commissioning evidence shows the threshold was wrong (not the build), how long does it take for that signal to reach the Story Contract — and how many specs ship with the same bad threshold in the gap?
What's the artifact that proves Loop 2 is running — that predictions are being scored, not just made?
If no feature has reached L4, is Loop 3 broken or just not yet started — and how do you tell the difference?

Where Are You?​

Two Flows​

Two Loops​

Traceability​

The SPEC-MAP​

Two VVFL Loops​

Three Credibility Loops​

Dig Deeper​

Context​

Questions​