Development Journeys
How does software go from someone's pain to proven value?
PAIN → DEMAND → SPEC → RANK → MAPS → TYPES → CODE → COMMISSION
│ │ │ │ │ │ │ │
│ │ │ │ │ │ │ └─ Independent verification (L0-L4)
│ │ │ │ │ │ └────────── Type-first implementation
│ │ │ │ │ └────────────────── Domain contracts from maps
│ │ │ │ └────────────────────────── Flow engineering: stories → maps
│ │ │ └────────────────────────────────── 5P scoring, sorted build order
│ │ └───────────────────────────────────────── Intent + Story + Build contracts
│ └────────────────────────────────────────────────── Evidence card: who hurts, how much
└──────────────────────────────────────────────────────── ── Observed friction, not assumed need
The pipeline has one rule: each step's output is the next step's input. Skip a step and the chain breaks. The Dream Team owns both ends — defining value (steps 1-3) and verifying delivery (step 8). Engineering owns the build (steps 5-7). Step 4 is the handoff — and it includes an architecture fitness gate that can bounce a spec back before building starts.
Two Flows
Two views of the same system. Both are necessary. Neither is sufficient alone.
| Flow | Direction | Question | Artifacts |
|---|---|---|---|
| User flow | Outside-in | What progress is someone trying to make? | Pain → Story Contract → Screen Contracts |
| Type flow | Inside-out | What contracts must the system honour? | Domain types → Infrastructure → Application → Presentation |
User flow starts with a person's pain. Type flow starts with the domain's truth. The Story Contract bridges them: each story row names a user's pain AND the data source, field, and threshold that proves it's fixed. Stories become Outcome Maps. Outcome Maps become domain types. Types become code.
Two Loops
Every verification happens in one of two loops. Know which one you're in.
| Loop | Speed | Cost | Catches | Tools |
|---|---|---|---|---|
| Inner | Fast (seconds) | Cheap | Contract violations, type errors, logic bugs | Compiler, unit tests, integration tests |
| Outer | Slow (minutes) | Expensive | Reality gaps, UX failures, production drift | E2E tests, browser verification, monitoring |
The inner loop runs every keystroke. The outer loop runs before shipping. Type-first development maximises inner-loop catches — a type error caught at compile time costs nothing. A type error caught in production costs everything.
Outer-loop validation reads reality through instruments: browser tools, performance gauges, error dashboards. The builder can't commission their own work — the commissioner reads the same instruments independently.
Traceability
One requirement traced through every stage. The chain must be unbroken.
| Stage | Artifact | Example |
|---|---|---|
| Pain | Observed friction | "Operator spends 30min manually auditing tool stack per category" |
| Demand | Evidence card | Interview transcript, frequency: weekly, 12 operators affected |
| Story | Story Contract row | S1: WHEN operator queries verdict THEN API returns in ≤2s with ≤5% divergence |
| Score | 5P frontmatter | Pain: 4, Demand: 3, Edge: 4, Trend: 3, Conversion: 3 → composite 432 |
| Map | Outcome Map node | queryVerdict() → VerdictResult with coverage percentage |
| Type | Domain contract | type VerdictResult = { verdict: string; coverage: number; sources: Source[] } |
| Test | Test file | apps/crm/tests/story-s1-verdict.spec.ts — RED before implementation |
| Code | Implementation | queryVerdict() function satisfying the type contract |
| Commission | L4 evidence | Commissioner screenshot: verdict returned in 1.8s, 97% match |
If you can't trace a requirement from pain to commission, the chain is broken. Find the break.
The SPEC-MAP
The traceability table above shows the chain conceptually. The SPEC-MAP makes it concrete — one row per Story Contract row, both sides writing to it. Dream fills the story. Engineering fills the test. Dream fills the commissioning result. Any empty cell is a broken link in the chain.
Without the SPEC-MAP, three gaps hide:
- Feature works, no test — Commissioning scores L4, CI has no regression guard. Next deploy breaks it silently
- Engineering changes behavior — Screen Contract says "skeleton loads" but the component renders immediately. Spec describes yesterday's product
- Story is untestable — Engineering skips it. Nobody notices until the feature regresses and nobody knows the expected behavior
Two VVFL Loops
The pipeline runs two VVFL loops. They share an error signal but measure different things.
DREAM TEAM LOOP ENGINEERING LOOP
───────────────── ─────────────────
Setpoint: "What value should exist?" Setpoint: "What code should exist?"
Gauge: 5P scores, commissioning evidence Gauge: Types, tests, enforcement tiers
Control: Return signals → evolve specs Control: Retrospectives → evolve templates
┌─── PAIN ──→ DEMAND ──→ SPEC ──→ RANK ───┐
│ │
│ ERROR SIGNAL ▼
│ ◄────────────────── MAPS → TYPES → CODE
│ (predicted vs actual) │
│ │
└──── COMMISSION ◄── VERIFY ◄── DEPLOY ◄──┘
| Dimension | Dream Team Loop | Engineering Loop |
|---|---|---|
| Setpoint | Story Contract rows — predicted thresholds | Domain types — compiler-enforced contracts |
| Gauge | Commissioning evidence (L0-L4) | Cost of quality metrics |
| Controller | Return signals → evolve Story Contract | Retrospective protocol → evolve templates |
| Actuator | Updated PRD, rescored priorities | Updated generator, hook, or rule |
| Virtuous when | Spec gap signal improves future specs | Each bug prevents its class forever |
| Broken when | Return signals don't flow back — specs never evolve | Fixes stay at instance level — same class recurs |
The Dream Team loop is slow (weeks) and measures value — did we spec the right thing? The Engineering loop is fast (hours) and measures correctness — did we build the thing right? The error signal between them — predicted vs actual outcomes at commissioning — is where the loops connect.
The danger: Each loop can appear healthy in isolation. Specs keep getting written (Dream looks healthy). Code keeps shipping (Engineering looks healthy). But if commissioning evidence never flows back to evolve the specs, the Dream loop runs open. And if retrospective findings never inform the next PRD's Story Contract, the Engineering loop's lessons stay local.
Three Credibility Loops
The two VVFL loops above are both internal. They answer "did we build the right thing correctly?" That's necessary but not sufficient. Credibility requires three loops, each harder to fake than the last.
LOOP 1: INNER "Does it work?" Tests pass, benchmarks met
│
▼
LOOP 2: STORY "Does our story match?" Predictions align with results
│
▼
LOOP 3: MARKET "Do others agree?" External validation with behavior
| Loop | Question | Evidence | Commissioning | Conviction |
|---|---|---|---|---|
| Inner | Does it pass our own standards? | Types compile, tests green, benchmarks met | L1-L3 | LOW — "it works" |
| Story | Do our claims match our results? | Predictions scored, kill criteria honoured, receipts filed | L3 with scored predictions | MEDIUM — "it matters" |
| Market | Do others validate with their behaviour? | Revenue, adoption, referrals, independent verification | L4 | HIGH — "others agree" |
Loop 1 is the engineering loop above — compiler, tests, commissioning evidence. You control it entirely.
Loop 2 is the bridge. Every PRD makes predictions: "this feature reduces X by Y." Every venture states kill criteria. The credibility score is correct predictions divided by total predictions, weighted by conviction. This is where the Dream Team loop becomes honest — not "did we build it" but "did our story about why it matters hold up?"
Loop 3 is what makes L4 real. Someone outside the system validates with their attention, their money, or their referral. No amount of internal testing gets you there. This is the software promise fulfilled: coordination with minimal need for trust — because the evidence is verifiable, not narrated.
The traceability chain extended:
| Stage | Loop | Artifact |
|---|---|---|
| Pain observed | — | Friction documented |
| Spec written | — | Story Contract with predictions |
| Code shipped | 1 | Tests green, benchmarks pass |
| Prediction scored | 2 | "We said 2s response — actual is 1.8s" |
| Kill criteria checked | 2 | We said kill if fewer than 10 users/month — actual is 47 |
| External adoption | 3 | Someone chose this over their alternative |
Most teams stop at Loop 1. They build, test, ship — and call that credibility. But the credibility page is clear: capability (Loop 1) is expected. Integrity — doing what you said, measured across time (Loop 2) — compounds. And the graph — others vouching with their behaviour (Loop 3) — is what makes the system trustworthy without requiring trust.
Market credibility is the greatest force. Loop 1 and Loop 2 are internal — you control the standards and you score your own predictions. Loop 3 is external — someone chose your product over their alternative, paid with attention or money, and came back. No amount of passing tests or scoring predictions substitutes for that signal. The SPEC-MAP tightens Loops 1 and 2 so that Loop 3 evidence — when it arrives — lands on a foundation that doesn't crack.
Dig Deeper
- Jobs To Be Done — Discover demand, spec stories, rank priorities. The user-flow pipeline (steps 1-3)
- Eng Dev Workflow — Flow engineering, type-first development, outer-loop validation. The type-flow pipeline (steps 5-7)
- Validate Outcomes — Independent commissioning: did the build match the spec? (step 8)
Context
- Flow Engineering — Stories become maps, maps become types, types become code
- Type-First Development — Domain contracts pull implementation through the inner loop
- Outer-Loop Validation — Instruments that read production reality
- Feature Matrix — Live commissioning status for every capability
- PRD Handoff Protocol — The interface between Dream Team and Engineering
- VVFL Evolution — The feedback loop model behind the pipeline
- Retrospective Protocol — Engineering loop: five gap types, enforcement hierarchy
- Process Optimisation — Pit of success patterns, improvement loop
- Credibility — Three layers: identity, capability, integrity. The prediction ledger that scores Loop 2
- Software — The promise: coordination with minimal need for trust. Loop 3 is where that promise gets tested
Questions
Where in the pipeline does the signal from the original pain get lost — and what does the artifact look like at the break point?
- If the inner loop catches 90% of defects, what's the remaining 10% that only the outer loop reveals — and is that acceptable risk?
- When a Story Contract row can't be traced to a domain type, is the story wrong or the type model incomplete?
- When commissioning evidence shows the threshold was wrong (not the build), how long does it take for that signal to reach the Story Contract — and how many specs ship with the same bad threshold in the gap?
- What's the artifact that proves Loop 2 is running — that predictions are being scored, not just made?
- If no feature has reached L4, is Loop 3 broken or just not yet started — and how do you tell the difference?