Create PRD Stories
How do you spec a product that never gives the same answer twice?
Traditional PRDs define behavior: "When user clicks X, system does Y." AI PRDs define boundaries and quality distributions: "Given input type X, output should score above Y on dimension Z at least N% of the time."
| Traditional PRD | AI PRD |
|---|---|
| Acceptance criteria: pass/fail | Quality targets: percentage above threshold |
| Edge cases are bugs | Edge cases are statistical certainties |
| Test before ship | Evaluate continuously |
| User stories (documentation) | Story contracts — each row = ≥1 test file, RED before impl, GREEN = value |
| Spec is for humans | Spec is for humans AND agents |
| Behavior instructions | Intent contracts — govern autonomy when instructions run out |
P2P Stories
Protocol Contract Format: Explain the journey from Pain to Performance with Validated Outcomes.
The arc: who hurts → how much → what number proves it's fixed → which Tight Five dimension this serves. Every requirement in a PRD uses this shape — including the PRD creation process itself.
### Story: <2–6 word verb-noun title>
FeatureID: <id or N/A>
Owner: <Dream Team | Engineering | Agent | Shared>
Tight Five: <Principles | Performance | Platform | Protocols | Players>
Pain: <who hurts, what they do today, measurable cost>
Performance Target: <the number that proves this works — unit, threshold, comparison>
When <specific trigger — one observable condition>
Then <specific system action — one required response>
Artifact created/impacted:
- <named object, file, record, table, or state>
Success is:
- <externally observable result matching Performance Target>
Failure if:
- <missing artifact, invalid state, or wrong value>
Writing Rules
One trigger, one primary action, every success statement externally observable, no adjectives without thresholds, no "should" (use "must"), no "etc.", every dependency named explicitly.
Performance Target rules:
- Must include a unit (seconds, dollars, rows, percentage)
- Must include a threshold (≤2s, ≥75%, ±$5)
- Must include a comparison point (vs manual, vs current, vs baseline)
- If you can't state the number, you don't understand the problem yet
Tight Five rules:
- Every story declares which P it primarily serves
- After all stories are written, check coverage — a PRD touching Platform but with zero Platform stories has a blind spot
- The mapping becomes the readiness evidence for scoring
Good vs Bad
| Quality | Story | Why |
|---|---|---|
| Bad | "When ETL runs, Then data loads successfully" | No pain. No number. No unit. "Successfully" is an adjective without threshold. |
| Bad | "When user queries tool stack, Then system returns correct verdict" | Whose pain? What's correct? How fast? Passes with a stub. |
| Good | "When operator queries build-or-buy verdict for a category, Then API returns verdict + coverage percentage in ≤2s with ≤5% divergence from manually audited truth" | Names the actor (operator). Names the pain (manual audit). Names the number (≤2s, ≤5%). Names the comparison (manually audited truth). |
The Pipeline
9 phases from observed pain to registered PRD. Each produces a named artifact. Full operational detail in the PRD creation work chart.
COLLECT → CLASSIFY → DEFINE → HIERARCHY → MAP → [PITCH + SCORE] → SCAFFOLD → COMPRESS → REGISTER → VERIFY
| Phase | Trigger | Artifact | Gate |
|---|---|---|---|
| COLLECT | Pain observed | spec/story.md — 4 inputs present | Actor, workaround, evidence, impact all named |
| CLASSIFY | Pain captured | Type + written rationale | One type, rejection rationale for adjacent types |
| DEFINE | Type assigned | spec/index.md — Intent Contract draft | SIO complete, 2+ outcomes, 1+ hard constraint |
| HIERARCHY | Intent defined | Overlap check result | No >50% overlap without declared relationship |
| MAP | Hierarchy clear | 5 picture files | All 5 written, none placeholder |
| PITCH + SCORE | Maps complete | Frontmatter scores | All 5P scores evidenced, composite calculated |
| SCAFFOLD | Scores set | 4 directories with files | Correct variant, no empty files |
| COMPRESS | Scaffold done | Edited draft | No line removable without information loss |
| REGISTER | Compressed | Priority table row | Row at correct position, table renders |
| VERIFY | Registered | prd-maintenance output | All 7 checks pass |
Story Contracts: PRD Creation
Story: Capture Observed Pain
FeatureID: N/A Owner: Dream Team
When a pain point is nominated for a PRD Then the author must record actor, observed pain, current workaround, and evidence source
Artifact created/impacted:
spec/story.md— Observed Pain section
Success is:
- All four inputs present and named
- A reader can identify who is struggling and why without additional context
Failure if:
- Any input is missing
- Pain statement describes a solution instead of a problem
Story: Assign PRD Type
FeatureID: N/A Owner: Dream Team
When observed pain is captured Then the PRD must be classified as Platform, Product, Agent, or Instrument with written rationale
Artifact created/impacted:
spec/story.md— Classification field
Success is:
- Exactly one type assigned
- Rationale explains why adjacent types were rejected
Failure if:
- More than one type selected
- Type assigned without rationale
Story: Pass Strategic Gate
FeatureID: N/A Owner: Dream Team
When classification is complete Then the PRD must score ≥50 composite on Pain × Demand × Edge × Trend × Conversion, each dimension evidenced
Artifact created/impacted:
index.mdfrontmatter:priority_pain,priority_demand,priority_edge,priority_trend,priority_conversion,priority_score
Success is:
- All 5 scores present with evidence note
- Composite ≥50 (bands: ≥500 build now, 200–499 strong, 50–199 promising, <50 park)
Failure if:
- Any score lacks evidence
- Composite not calculated
- PRD advances to SCAFFOLD with composite <50
Story: Define Intent Contract
FeatureID: N/A Owner: Dream Team
When strategic gate passes
Then spec/index.md must contain a complete Intent Contract with all 9 dimensions
Artifact created/impacted:
spec/index.md— Intent Contract section
Success is:
- 2–4 outcomes, each observable without trusting agent self-report
- Hard constraints enforceable outside prompts
- Decision autonomy matrix has Allowed / Escalate / Never columns
- Every outcome has an artifact and a verification method
Failure if:
- Any outcome relies on agent self-assessment ("agent handles it well")
- Hard constraints exist only in prompts
- Stop rules absent
Story: Write Story Contract
FeatureID: N/A Owner: Dream Team
When intent contract is complete
Then spec/index.md must contain a Story Contract table with ≥1 story per user role per critical flow
Artifact created/impacted:
spec/index.md— Story Contract table
Success is:
- Every story has all 7 columns populated: #, WHEN, THEN, ARTIFACT, Test Type, FORBIDDEN, OUTCOME
- THEN names data source, field, and threshold — rejects "outputs valid JSON" or "command exits 0"
- ARTIFACT has exact file path — no TBD
- Every story has ≥1 Forbidden Outcome naming the counterfeit success
Failure if:
- ARTIFACT column blank or "TBD" — emit BLOCKER
- THEN does not name data source and threshold
- FORBIDDEN is generic ("should not fail")
- A user role has no story in a critical flow
Story: Build Feature Table
FeatureID: N/A Owner: Dream Team
When story contract is written
Then spec/index.md must contain a Build Contract (FAVV v2.1) with ≤5 distinct FeatureIDs, all from the RaaS catalog or Platform Proprietary register
Artifact created/impacted:
spec/index.md— Build Contract tablefeature-matrix.md— FeatureID rows (updated post-commissioning)
Success is:
- Every row has all 9 columns: #, FeatureID, Feature, Function, Artifact, Success Test, Safety Test, Value, State
- Function column uses verbs — "Browse, search, filter contacts" not "Contacts"
- Success Test references Story IDs (S1, S2…)
- Safety Test references Forbidden Outcomes from Story Contract
- State matches exact enum
Failure if:
- More than 5 distinct FeatureIDs — split into multiple PRDs
- Function column contains nouns only — parser cannot generate tests
- State value not from enum
- FeatureID absent from RaaS catalog or proprietary register
Story: Register PRD
FeatureID: N/A Owner: Dream Team
When build contract is finalized Then the PRD must appear in the priority table at the correct position and all prd-maintenance checks must pass
Artifact created/impacted:
src/pages/priorities/index.md— Active/Pipeline/Backlog rowsrc/pages/feature-matrix.md— FeatureID status rows
Success is:
- Row at correct priority order (score descending)
- All 7 prd-maintenance checks pass
- Every FeatureID in build contract has a feature-matrix row
Failure if:
- PRD missing from priority table
- Table order does not match priority score
- Any prd-maintenance check fails
Intent Contract
The governing document. When an agent runs out of instructions, intent governs. Write this before the feature table.
| Dimension | Question | Example |
|---|---|---|
| Objective | What problem, and why now? | "Month-end reporting takes 3 days of manual copy-paste across 6 spreadsheets" |
| Desired Outcomes | Observable state changes proving success (2–4) | "No follow-up ticket within 24h" — not "agent asked good questions" |
| Health Metrics | What must not degrade? | "Existing deal close rate stays above 28%" |
| Hard Constraints | What the system blocks regardless of agent intent | Enforceable outside prompts — or they're wishes |
| Steering | How it should think and trade off | Guidance for judgment calls |
| Autonomy | What it may do alone / must escalate / must never do | Allowed / Escalate / Never |
| Stop Rules | Complete when... Halt & escalate when... | "Complete when 3 consecutive evals pass. Halt when error rate exceeds 5%" |
| Evidence | For each outcome, what artifact proves it happened? | Log entry, scored eval, refusal record, drift report |
| Verification | How each artifact is checked | Rubric, automated check, human audit |
Outcomes rule: "Report generated in <10 min, no manual correction needed" passes. "Agent generates reports quickly" fails — no threshold, no artifact, no time bound.
AI quality contracts belong in Intent Contract outcomes:
For [feature], outputs must score:
- [Dimension A]: ≥ [score] for [percentage]% of requests
- NEVER: [unacceptable outcome] for more than [N]% of requests
Constraint types: Hard constraints (code, middleware, policy engine — outside the prompt) vs Steering constraints (prompt, context, principles — inside the prompt). Hard constraints that live only in prompts are wishes.
Story Contract Schema
Stories are test contracts, not documentation. Each row becomes ≥1 test file. Tests RED = not started. Tests GREEN = value delivered. N stories → ≥N test files.
| Column | Rule |
|---|---|
| # | Story ID (S1, S2…) — referenced in Build Contract Success Test and Safety Test columns |
| WHEN | Trigger + precondition. Names the data state that must exist first. |
| THEN | Exact assertion naming data source, field, threshold. Reject: "outputs valid JSON", "command exits 0", "test passes" |
| ARTIFACT | Exact path to test file engineering must write. Emit BLOCKER if unknown. Never blank. |
| Test Type | unit / integration / e2e / a2a — most stories need 2 rows (unit + e2e or a2a) |
| FORBIDDEN | What must NOT be true when story passes. Names the counterfeit success — the stub that satisfies a weak THEN. |
| OUTCOME | Business value proven when THEN passes. Must include a performance number — not "works", not "loads correctly". Example: "Operator gets build/buy verdict in ≤2s instead of 30min manual audit" |
Tight Five Coverage Check
After writing all stories, map them to the Tight Five:
| P | Story Coverage Question | Red Flag if Missing |
|---|---|---|
| Principles | Does a story prove value transformation is real? | Building without knowing if it matters |
| Performance | Does a story carry a measurable target with unit + threshold? | Activity without outcomes — the #1 failure mode |
| Platform | Does a story prove it ships on existing stack? | Dependent on things you don't control |
| Protocols | Does a story prove the output compounds (reusable, queryable, standard)? | Success that doesn't compound |
| Players | Does a story prove someone real will use it? | Isolated agency — built for nobody |
Not every PRD covers all five. But every gap should be a conscious decision, not an oversight. If a PRD touches Platform but has zero Platform stories, the readiness score for Platform cannot exceed 2.
Example:
| # | WHEN | THEN | ARTIFACT | Test Type | FORBIDDEN | OUTCOME |
|---|---|---|---|---|---|---|
| S1 | User searches "security compliance" AND ≥1 approved answer exists with matching category | searchAnswers() returns array where length >= 1 AND every result.score >= 0.8 AND result.category === "security" | apps/crm/tests/story-s1-answer-library.spec.ts | integration | Empty array when seeded answers exist — query not wired to DB | Answer Library surfaces past bid answers with confidence scores |
Build Contract Schema
FAVV v2.1 — the parser reads this table. Column names are the interface contract. Change them and the parser breaks.
| Column | Engineering Reads As |
|---|---|
| FeatureID | RaaS catalog ID — links to feature-matrix row |
| Feature | Name of the capability |
| Function | Verb-led behavior. "Browse, search, filter contacts" = 3 tests. "Contacts" = BLOCKER |
| Artifact | Concrete deliverable: TypeScript client, PostgreSQL migration, React component |
| Success Test | Happy-path criteria referencing Story IDs. This IS the e2e test spec. |
| Safety Test | What must NOT happen — from FORBIDDEN in Story Contract |
| Regression Test | What existing capability must NOT degrade — names capability + threshold |
| Value | Business outcome in one sentence |
| State | Enum only: Live · Built · Dormant · Partial · Not verified · Gap · Stub · Broken |
Scope rule: Max 5 distinct FeatureIDs per PRD. Scope is frozen at REGISTER time. Adding a new FeatureID after registration requires a new PRD.
Navigation Journey Checklist
Every Screen Contract must map the user's journey from pain to effortless performance. The UI team needs this to build information architecture that carries the user forward — not just pages that contain features.
The Six Stages
| Stage | User State | Navigation Question | Screen Contract Must Answer |
|---|---|---|---|
| Pain | Friction is the norm | What does the user do today that hurts? | Document the current navigation path and its cost (clicks, time, context switches) |
| Awareness | There's a faster way | How does the user discover the new capability? | Entry points: where in existing navigation does the new surface appear? First-time indicators (badges, pulses, onboarding tooltips) |
| First Value | That was faster | What's the smallest interaction that proves value? | Empty state design: suggested prompts, pre-loaded context, zero-config first use. Time-to-value target |
| Habit | I go here first now | How does the new surface become the default path? | Navigation hierarchy: does this become the first item? Does the dashboard change? What signals reinforce the new behavior? |
| Mastery | I do everything from here | How does the power user stay in flow? | Cross-cutting navigation: deep-links between the new surface and existing sections. Contextual triggers from existing pages back to the new surface |
| Effortless | It anticipates what I need | How does the system reduce even the need to navigate? | Proactive suggestions, context-aware defaults, ambient awareness of user intent |
Applying to Screen Contracts
For every Screen Contract in a PRD, add a Navigation Architecture section that answers:
- Current nav path — What sidebar/menu items exist today? What is the user's current click-path to accomplish this job?
- Target nav path — Where does the new screen appear in the hierarchy? Above or below existing items? Why?
- Feature flag states — What does navigation look like with the flag OFF vs ON? What changes at each build sprint (T0, T1, T2...)?
- Cross-cutting routes — A table mapping: From (existing page) → To (new surface) → Trigger (what the user clicks) → What Happens (navigation behavior, context carried forward)
- Wiring coordinates — Exact file paths for the sidebar component, layout wrapper, floating widget, and any page that gains a contextual trigger
Why This Matters
The multimodal agent PRD had a complete Screen Contract (route, states, selectors, elements) but zero navigation architecture. The UI team knew WHAT to build but not WHERE it goes, HOW users find it, or HOW it integrates with 6 existing sidebar sections. The Screen Contract described a destination without a map.
A screen without navigation architecture is a room without a door.
Checklist
Before engineering receives the spec:
Intent contract
- 2–4 outcomes, each observable without trusting agent self-report
- Hard constraints enforceable outside prompts (code, middleware, policy engine)
- Autonomy matrix: Allowed / Escalate / Never
- Stop rules: Complete when... / Halt & escalate when...
- Every outcome has artifact + verification method
Story contract
- WHEN names trigger AND precondition data state
- THEN names data source, field, and threshold — not "command exits 0"
- ARTIFACT is exact file path — no TBD, emit BLOCKER if unknown
- FORBIDDEN names the counterfeit success, not just "should not fail"
- OUTCOME includes performance number (unit + threshold + comparison) — not "works"
- Each story declares Tight Five dimension (Principles/Performance/Platform/Protocols/Players)
- Tight Five coverage check: every P the PRD touches has ≥1 story
- ≥1 story per user role per critical flow
Build contract
- ≤5 distinct FeatureIDs — all from RaaS catalog or proprietary register
- Function column uses verbs — parser cannot derive tests from nouns
- Success Test references Story IDs
- Safety Test references FORBIDDEN outcomes from Story Contract
- All State values from exact enum
Navigation architecture (for any PRD with UI screens)
- Pain-to-perform journey: all 6 stages documented with navigation implications
- Current nav path documented (existing sidebar/menu structure)
- Target nav path documented (where new screen appears, why that position)
- Feature flag states: nav with flag OFF vs ON at each sprint
- Cross-cutting routes table: From → To → Trigger → What Happens
- Wiring coordinates: exact file paths for sidebar, layout, widget, contextual triggers
Registration
- Priority score in frontmatter with evidence per dimension
- Row in priority table at correct position
- All 7 prd-maintenance checks pass
- No in-flight engineering plan covers the same FeatureIDs (query active projects before handoff)
- No sibling PRD in priority table shares >50% of FeatureIDs without a declared relationship
Return Signals
The feedback loop closer. Without return signals, the pipeline runs open — specs produce builds that never correct the spec.
Trigger: Engineering completes a capability and the commissioner runs the L0-L4 protocol. The gap between predicted scores and actual outcomes IS the return signal.
Format: The SPEC-MAP is the shared artifact. For each Story Contract row, the commissioner records: Pass/Fail, actual measurement vs predicted threshold, and evidence (screenshot, GIF, console output). This populates the L-Level and Last Verified columns — one row per story, predicted vs actual side by side.
Receiver: The Dream Team author who wrote the original PRD.
Three responses:
- Story passed, threshold met — capability promoted, SPEC-MAP row complete, no spec change
- Story failed, build gap — engineering issue logged, fix stream picks it up, SPEC-MAP Test Status stays RED
- Story passed but threshold was wrong — spec gap. Update the Story Contract with the real threshold AND update the SPEC-MAP WHEN/THEN columns to match reality. This is the most valuable signal — it improves future specs
The reverse signal: Engineering also triggers return signals. When a build changes behavior that affects a Story Contract row (removing a skeleton gate, changing a redirect, altering a loading state), engineering updates the SPEC-MAP and posts a spec-delta to #meta. Dream Team updates spec/index.md to match. Without this, specs fossilize — they describe yesterday's product while engineering ships tomorrow's.
The return signal feeds back into 5P scoring. A capability that consistently misses Performance targets should have its readiness scores adjusted. See PRD Handoff Protocol for parser detection, bookend gates, plan templates, and the SPEC-MAP contract.
Context
- PRD Handoff Protocol — Engineering interface: parser detection, bookend gates, plan templates, return signals
- Validate Demand — Demand evidence before PRD work begins
- Prioritisation Algorithm — Scoring rubric and build order
- Commissioning Protocol — L0–L4 maturity model
- Feature Matrix — Commissioning status for all platform features
- RaaS Catalog — FeatureID source of truth
- Pictures Templates — Pre-flight maps that feed the PRD
- Flow Engineering — Stories become maps, maps become types, types become code
Links
- Lenny Rachitsky — PRD Templates — Problem before solution, one page, non-goals section. No AI awareness, no build contract.
- Miqdad Jaffer (OpenAI) — AI PRD Template — Battle-tested AI PRD from someone who shipped at Shopify. Human-only — no agent-facing spec or commissioning model.
- Addy Osmani (Google) — Specs for AI Agents — Names the concept: "Agent Experience (AX)." Six core areas, modular context, boundaries.
- GitHub Spec Kit — Spec-Driven Development — Four-phase gated workflow. Specs as executable artifacts.
- Mountain Goat Software — User Story Template — Why the three-part story template works: trigger, behavior, proof.
- ONES — Given/When/Then Format — Given/When/Then improves clarity and testability across product and QA.
- Intent Engineering — Fusing Intent with Dreamineering — Intent as the top-level spec artifact governing autonomy when instructions run out.
Questions
When an agent runs out of instructions, what governs its next decision — and is that written down?
- What's the difference between a constraint enforced structurally and one enforced with words — and which holds when it matters?
- If every outcome needs an artifact that proves it happened, which current PRDs have outcomes with no proof mechanism?
- When a story's FORBIDDEN column is blank, is that a time shortcut or a belief that the happy path is the only path?
- What breaks first when engineering receives a spec where the Function column contains nouns only?