Skip to main content

Development Journeys

How does software go from someone's pain to proven value?

PAIN → DEMAND → SPEC → RANK → MAPS → TYPES → CODE → COMMISSION
│ │ │ │ │ │ │ │
│ │ │ │ │ │ │ └─ Independent verification (L0-L4)
│ │ │ │ │ │ └────────── Type-first implementation
│ │ │ │ │ └────────────────── Domain contracts from maps
│ │ │ │ └────────────────────────── Flow engineering: stories → maps
│ │ │ └────────────────────────────────── 5P scoring, sorted build order
│ │ └───────────────────────────────────────── Intent + Story + Build contracts
│ └────────────────────────────────────────────────── Evidence card: who hurts, how much
└────────────────────────────────────────────────────────── Observed friction, not assumed need

The pipeline has one rule: each step's output is the next step's input. Skip a step and the chain breaks. The Dream Team owns both ends — defining value (steps 1-3) and verifying delivery (step 8). Engineering owns the build (steps 5-7). Step 4 is the handoff — and it includes an architecture fitness gate that can bounce a spec back before building starts.

Two Flows

Two views of the same system. Both are necessary. Neither is sufficient alone.

FlowDirectionQuestionArtifacts
User flowOutside-inWhat progress is someone trying to make?Pain → Story Contract → Screen Contracts
Type flowInside-outWhat contracts must the system honour?Domain types → Infrastructure → Application → Presentation

User flow starts with a person's pain. Type flow starts with the domain's truth. The Story Contract bridges them: each story row names a user's pain AND the data source, field, and threshold that proves it's fixed. Stories become Outcome Maps. Outcome Maps become domain types. Types become code.

Two Loops

Every verification happens in one of two loops. Know which one you're in.

LoopSpeedCostCatchesTools
InnerFast (seconds)CheapContract violations, type errors, logic bugsCompiler, unit tests, integration tests
OuterSlow (minutes)ExpensiveReality gaps, UX failures, production driftE2E tests, browser verification, monitoring

The inner loop runs every keystroke. The outer loop runs before shipping. Type-first development maximises inner-loop catches — a type error caught at compile time costs nothing. A type error caught in production costs everything.

Outer-loop validation reads reality through instruments: browser tools, performance gauges, error dashboards. The builder can't commission their own work — the commissioner reads the same instruments independently.

Traceability

One requirement traced through every stage. The chain must be unbroken.

StageArtifactExample
PainObserved friction"Operator spends 30min manually auditing tool stack per category"
DemandEvidence cardInterview transcript, frequency: weekly, 12 operators affected
StoryStory Contract rowS1: WHEN operator queries verdict THEN API returns in ≤2s with ≤5% divergence
Score5P frontmatterPain: 4, Demand: 3, Edge: 4, Trend: 3, Conversion: 3 → composite 432
MapOutcome Map nodequeryVerdict() → VerdictResult with coverage percentage
TypeDomain contracttype VerdictResult = { verdict: string; coverage: number; sources: Source[] }
TestTest fileapps/crm/tests/story-s1-verdict.spec.ts — RED before implementation
CodeImplementationqueryVerdict() function satisfying the type contract
CommissionL4 evidenceCommissioner screenshot: verdict returned in 1.8s, 97% match

If you can't trace a requirement from pain to commission, the chain is broken. Find the break.

The SPEC-MAP

The traceability table above shows the chain conceptually. The SPEC-MAP makes it concrete — one row per Story Contract row, both sides writing to it. Dream fills the story. Engineering fills the test. Dream fills the commissioning result. Any empty cell is a broken link in the chain.

Without the SPEC-MAP, three gaps hide:

  1. Feature works, no test — Commissioning scores L4, CI has no regression guard. Next deploy breaks it silently
  2. Engineering changes behavior — Screen Contract says "skeleton loads" but the component renders immediately. Spec describes yesterday's product
  3. Story is untestable — Engineering skips it. Nobody notices until the feature regresses and nobody knows the expected behavior

Two VVFL Loops

The pipeline runs two VVFL loops. They share an error signal but measure different things.

DREAM TEAM LOOP                              ENGINEERING LOOP
───────────────── ─────────────────
Setpoint: "What value should exist?" Setpoint: "What code should exist?"
Gauge: 5P scores, commissioning evidence Gauge: Types, tests, enforcement tiers
Control: Return signals → evolve specs Control: Retrospectives → evolve templates

┌─── PAIN ──→ DEMAND ──→ SPEC ──→ RANK ───┐
│ │
│ ERROR SIGNAL ▼
│ ◄────────────────── MAPS → TYPES → CODE
│ (predicted vs actual) │
│ │
└──── COMMISSION ◄── VERIFY ◄── DEPLOY ◄──┘
DimensionDream Team LoopEngineering Loop
SetpointStory Contract rows — predicted thresholdsDomain types — compiler-enforced contracts
GaugeCommissioning evidence (L0-L4)Cost of quality metrics
ControllerReturn signals → evolve Story ContractRetrospective protocol → evolve templates
ActuatorUpdated PRD, rescored prioritiesUpdated generator, hook, or rule
Virtuous whenSpec gap signal improves future specsEach bug prevents its class forever
Broken whenReturn signals don't flow back — specs never evolveFixes stay at instance level — same class recurs

The Dream Team loop is slow (weeks) and measures value — did we spec the right thing? The Engineering loop is fast (hours) and measures correctness — did we build the thing right? The error signal between them — predicted vs actual outcomes at commissioning — is where the loops connect.

The danger: Each loop can appear healthy in isolation. Specs keep getting written (Dream looks healthy). Code keeps shipping (Engineering looks healthy). But if commissioning evidence never flows back to evolve the specs, the Dream loop runs open. And if retrospective findings never inform the next PRD's Story Contract, the Engineering loop's lessons stay local.

Three Credibility Loops

The two VVFL loops above are both internal. They answer "did we build the right thing correctly?" That's necessary but not sufficient. Credibility requires three loops, each harder to fake than the last.

LOOP 1: INNER       "Does it work?"           Tests pass, benchmarks met


LOOP 2: STORY "Does our story match?" Predictions align with results


LOOP 3: MARKET "Do others agree?" External validation with behavior
LoopQuestionEvidenceCommissioningConviction
InnerDoes it pass our own standards?Types compile, tests green, benchmarks metL1-L3LOW — "it works"
StoryDo our claims match our results?Predictions scored, kill criteria honoured, receipts filedL3 with scored predictionsMEDIUM — "it matters"
MarketDo others validate with their behaviour?Revenue, adoption, referrals, independent verificationL4HIGH — "others agree"

Loop 1 is the engineering loop above — compiler, tests, commissioning evidence. You control it entirely.

Loop 2 is the bridge. Every PRD makes predictions: "this feature reduces X by Y." Every venture states kill criteria. The credibility score is correct predictions divided by total predictions, weighted by conviction. This is where the Dream Team loop becomes honest — not "did we build it" but "did our story about why it matters hold up?"

Loop 3 is what makes L4 real. Someone outside the system validates with their attention, their money, or their referral. No amount of internal testing gets you there. This is the software promise fulfilled: coordination with minimal need for trust — because the evidence is verifiable, not narrated.

The traceability chain extended:

StageLoopArtifact
Pain observedFriction documented
Spec writtenStory Contract with predictions
Code shipped1Tests green, benchmarks pass
Prediction scored2"We said 2s response — actual is 1.8s"
Kill criteria checked2We said kill if fewer than 10 users/month — actual is 47
External adoption3Someone chose this over their alternative

Most teams stop at Loop 1. They build, test, ship — and call that credibility. But the credibility page is clear: capability (Loop 1) is expected. Integrity — doing what you said, measured across time (Loop 2) — compounds. And the graph — others vouching with their behaviour (Loop 3) — is what makes the system trustworthy without requiring trust.

Market credibility is the greatest force. Loop 1 and Loop 2 are internal — you control the standards and you score your own predictions. Loop 3 is external — someone chose your product over their alternative, paid with attention or money, and came back. No amount of passing tests or scoring predictions substitutes for that signal. The SPEC-MAP tightens Loops 1 and 2 so that Loop 3 evidence — when it arrives — lands on a foundation that doesn't crack.

Dig Deeper

  • Jobs To Be Done — Discover demand, spec stories, rank priorities. The user-flow pipeline (steps 1-3)
  • Eng Dev Workflow — Flow engineering, type-first development, outer-loop validation. The type-flow pipeline (steps 5-7)
  • Validate Outcomes — Independent commissioning: did the build match the spec? (step 8)

Context

Questions

Where in the pipeline does the signal from the original pain get lost — and what does the artifact look like at the break point?

  • If the inner loop catches 90% of defects, what's the remaining 10% that only the outer loop reveals — and is that acceptable risk?
  • When a Story Contract row can't be traced to a domain type, is the story wrong or the type model incomplete?
  • When commissioning evidence shows the threshold was wrong (not the build), how long does it take for that signal to reach the Story Contract — and how many specs ship with the same bad threshold in the gap?
  • What's the artifact that proves Loop 2 is running — that predictions are being scored, not just made?
  • If no feature has reached L4, is Loop 3 broken or just not yet started — and how do you tell the difference?