Create PRD Stories
How do you spec a product that never gives the same answer twice?
Traditional PRDs define behavior: "When user clicks X, system does Y." AI PRDs define boundaries and quality distributions: "Given input type X, output should score above Y on dimension Z at least N% of the time."
| Traditional PRD | AI PRD |
|---|---|
| Acceptance criteria: pass/fail | Quality targets: percentage above threshold |
| Edge cases are bugs | Edge cases are statistical certainties |
| Test before ship | Evaluate continuously |
| User stories (documentation) | Story contracts — each row = ≥1 test file, RED before impl, GREEN = value |
| Spec is for humans | Spec is for humans AND agents |
| Behavior instructions | Intent contracts — govern autonomy when instructions run out |
P2P Stories
Protocol Contract Format: Explain the journey from Pain to Performance with Validated Outcomes.
The arc: who hurts → how much → what number proves it's fixed → which Tight Five dimension this serves. Every requirement in a PRD uses this shape — including the PRD creation process itself.
### Story: <2–6 word verb-noun title>
FeatureID: <id or N/A>
Owner: <Dream Team | Engineering | Agent | Shared>
Tight Five: <Principles | Performance | Platform | Protocols | Players>
Pain: <who hurts, what they do today, measurable cost>
Performance Target: <the number that proves this works — unit, threshold, comparison>
When <specific trigger — one observable condition>
Then <specific system action — one required response>
Artifact created/impacted:
- <named object, file, record, table, or state>
Success is:
- <externally observable result matching Performance Target>
Failure if:
- <missing artifact, invalid state, or wrong value>
Writing Rules
One trigger, one primary action, every success statement externally observable, no adjectives without thresholds, no "should" (use "must"), no "etc.", every dependency named explicitly.
Performance Target rules:
- Must include a unit (seconds, dollars, rows, percentage)
- Must include a threshold (≤2s, ≥75%, ±$5)
- Must include a comparison point (vs manual, vs current, vs baseline)
- If you can't state the number, you don't understand the problem yet
Tight Five rules:
- Every story declares which P it primarily serves
- After all stories are written, check coverage — a PRD touching Platform but with zero Platform stories has a blind spot
- The mapping becomes the readiness evidence for scoring
Good vs Bad
| Quality | Story | Why |
|---|---|---|
| Bad | "When ETL runs, Then data loads successfully" | No pain. No number. No unit. "Successfully" is an adjective without threshold. |
| Bad | "When user queries tool stack, Then system returns correct verdict" | Whose pain? What's correct? How fast? Passes with a stub. |
| Good | "When operator queries build-or-buy verdict for a category, Then API returns verdict + coverage percentage in ≤2s with ≤5% divergence from manually audited truth" | Names the actor (operator). Names the pain (manual audit). Names the number (≤2s, ≤5%). Names the comparison (manually audited truth). |
The Pipeline
9 phases from observed pain to registered PRD. Each produces a named artifact. Full operational detail: .ai/plans/create-prd/CLAUDE.md.
COLLECT → CLASSIFY → DEFINE → HIERARCHY → MAP → [PITCH + SCORE] → SCAFFOLD → COMPRESS → REGISTER → VERIFY
| Phase | Trigger | Artifact | Gate |
|---|---|---|---|
| COLLECT | Pain observed | spec/story.md — 4 inputs present | Actor, workaround, evidence, impact all named |
| CLASSIFY | Pain captured | Type + written rationale | One type, rejection rationale for adjacent types |
| DEFINE | Type assigned | spec/index.md — Intent Contract draft | SIO complete, 2+ outcomes, 1+ hard constraint |
| HIERARCHY | Intent defined | Overlap check result | No >50% overlap without declared relationship |
| MAP | Hierarchy clear | 5 picture files | All 5 written, none placeholder |
| PITCH + SCORE | Maps complete | Frontmatter scores | All 5P scores evidenced, composite calculated |
| SCAFFOLD | Scores set | 4 directories with files | Correct variant, no empty files |
| COMPRESS | Scaffold done | Edited draft | No line removable without information loss |
| REGISTER | Compressed | Priority table row | Row at correct position, table renders |
| VERIFY | Registered | prd-maintenance output | All 7 checks pass |
Story Contracts: PRD Creation
Story: Capture Observed Pain
FeatureID: N/A Owner: Dream Team
When a pain point is nominated for a PRD Then the author must record actor, observed pain, current workaround, and evidence source
Artifact created/impacted:
spec/story.md— Observed Pain section
Success is:
- All four inputs present and named
- A reader can identify who is struggling and why without additional context
Failure if:
- Any input is missing
- Pain statement describes a solution instead of a problem
Story: Assign PRD Type
FeatureID: N/A Owner: Dream Team
When observed pain is captured Then the PRD must be classified as Platform, Product, Agent, or Instrument with written rationale
Artifact created/impacted:
spec/story.md— Classification field
Success is:
- Exactly one type assigned
- Rationale explains why adjacent types were rejected
Failure if:
- More than one type selected
- Type assigned without rationale
Story: Pass Strategic Gate
FeatureID: N/A Owner: Dream Team
When classification is complete Then the PRD must score ≥50 composite on Pain × Demand × Edge × Trend × Conversion, each dimension evidenced
Artifact created/impacted:
index.mdfrontmatter:priority_pain,priority_demand,priority_edge,priority_trend,priority_conversion,priority_score
Success is:
- All 5 scores present with evidence note
- Composite ≥50 (bands: ≥500 build now, 200–499 strong, 50–199 promising, <50 park)
Failure if:
- Any score lacks evidence
- Composite not calculated
- PRD advances to SCAFFOLD with composite <50
Story: Define Intent Contract
FeatureID: N/A Owner: Dream Team
When strategic gate passes
Then spec/index.md must contain a complete Intent Contract with all 9 dimensions
Artifact created/impacted:
spec/index.md— Intent Contract section
Success is:
- 2–4 outcomes, each observable without trusting agent self-report
- Hard constraints enforceable outside prompts
- Decision autonomy matrix has Allowed / Escalate / Never columns
- Every outcome has an artifact and a verification method
Failure if:
- Any outcome relies on agent self-assessment ("agent handles it well")
- Hard constraints exist only in prompts
- Stop rules absent
Story: Write Story Contract
FeatureID: N/A Owner: Dream Team
When intent contract is complete
Then spec/index.md must contain a Story Contract table with ≥1 story per user role per critical flow
Artifact created/impacted:
spec/index.md— Story Contract table
Success is:
- Every story has all 7 columns populated: #, WHEN, THEN, ARTIFACT, Test Type, FORBIDDEN, OUTCOME
- THEN names data source, field, and threshold — rejects "outputs valid JSON" or "command exits 0"
- ARTIFACT has exact file path — no TBD
- Every story has ≥1 Forbidden Outcome naming the counterfeit success
Failure if:
- ARTIFACT column blank or "TBD" — emit BLOCKER
- THEN does not name data source and threshold
- FORBIDDEN is generic ("should not fail")
- A user role has no story in a critical flow
Story: Build Feature Table
FeatureID: N/A Owner: Dream Team
When story contract is written
Then spec/index.md must contain a Build Contract (FAVV v2.1) with ≤5 distinct FeatureIDs, all from the RaaS catalog or Platform Proprietary register
Artifact created/impacted:
spec/index.md— Build Contract tablefeature-matrix.md— FeatureID rows (updated post-commissioning)
Success is:
- Every row has all 9 columns: #, FeatureID, Feature, Function, Artifact, Success Test, Safety Test, Value, State
- Function column uses verbs — "Browse, search, filter contacts" not "Contacts"
- Success Test references Story IDs (S1, S2…)
- Safety Test references Forbidden Outcomes from Story Contract
- State matches exact enum
Failure if:
- More than 5 distinct FeatureIDs — split into multiple PRDs
- Function column contains nouns only — parser cannot generate tests
- State value not from enum
- FeatureID absent from RaaS catalog or proprietary register
Story: Register PRD
FeatureID: N/A Owner: Dream Team
When build contract is finalized Then the PRD must appear in the priority table at the correct position and all prd-maintenance checks must pass
Artifact created/impacted:
src/pages/priorities/index.md— Active/Pipeline/Backlog rowsrc/pages/feature-matrix.md— FeatureID status rows
Success is:
- Row at correct priority order (score descending)
- All 7 prd-maintenance checks pass
- Every FeatureID in build contract has a feature-matrix row
Failure if:
- PRD missing from priority table
- Table order does not match priority score
- Any prd-maintenance check fails
Intent Contract
The governing document. When an agent runs out of instructions, intent governs. Write this before the feature table.
| Dimension | Question | Example |
|---|---|---|
| Objective | What problem, and why now? | "Month-end reporting takes 3 days of manual copy-paste across 6 spreadsheets" |
| Desired Outcomes | Observable state changes proving success (2–4) | "No follow-up ticket within 24h" — not "agent asked good questions" |
| Health Metrics | What must not degrade? | "Existing deal close rate stays above 28%" |
| Hard Constraints | What the system blocks regardless of agent intent | Enforceable outside prompts — or they're wishes |
| Steering | How it should think and trade off | Guidance for judgment calls |
| Autonomy | What it may do alone / must escalate / must never do | Allowed / Escalate / Never |
| Stop Rules | Complete when... Halt & escalate when... | "Complete when 3 consecutive evals pass. Halt when error rate exceeds 5%" |
| Evidence | For each outcome, what artifact proves it happened? | Log entry, scored eval, refusal record, drift report |
| Verification | How each artifact is checked | Rubric, automated check, human audit |
Outcomes rule: "Report generated in <10 min, no manual correction needed" passes. "Agent generates reports quickly" fails — no threshold, no artifact, no time bound.
AI quality contracts belong in Intent Contract outcomes:
For [feature], outputs must score:
- [Dimension A]: ≥ [score] for [percentage]% of requests
- NEVER: [unacceptable outcome] for more than [N]% of requests
Constraint types: Hard constraints (code, middleware, policy engine — outside the prompt) vs Steering constraints (prompt, context, principles — inside the prompt). Hard constraints that live only in prompts are wishes.
Story Contract Schema
Stories are test contracts, not documentation. Each row becomes ≥1 test file. Tests RED = not started. Tests GREEN = value delivered. N stories → ≥N test files.
| Column | Rule |
|---|---|
| # | Story ID (S1, S2…) — referenced in Build Contract Success Test and Safety Test columns |
| WHEN | Trigger + precondition. Names the data state that must exist first. |
| THEN | Exact assertion naming data source, field, threshold. Reject: "outputs valid JSON", "command exits 0", "test passes" |
| ARTIFACT | Exact path to test file engineering must write. Emit BLOCKER if unknown. Never blank. |
| Test Type | unit / integration / e2e / a2a — most stories need 2 rows (unit + e2e or a2a) |
| FORBIDDEN | What must NOT be true when story passes. Names the counterfeit success — the stub that satisfies a weak THEN. |
| OUTCOME | Business value proven when THEN passes. Must include a performance number — not "works", not "loads correctly". Example: "Operator gets build/buy verdict in ≤2s instead of 30min manual audit" |
Tight Five Coverage Check
After writing all stories, map them to the Tight Five:
| P | Story Coverage Question | Red Flag if Missing |
|---|---|---|
| Principles | Does a story prove value transformation is real? | Building without knowing if it matters |
| Performance | Does a story carry a measurable target with unit + threshold? | Activity without outcomes — the #1 failure mode |
| Platform | Does a story prove it ships on existing stack? | Dependent on things you don't control |
| Protocols | Does a story prove the output compounds (reusable, queryable, standard)? | Success that doesn't compound |
| Players | Does a story prove someone real will use it? | Isolated agency — built for nobody |
Not every PRD covers all five. But every gap should be a conscious decision, not an oversight. If a PRD touches Platform but has zero Platform stories, the readiness score for Platform cannot exceed 2.
Example:
| # | WHEN | THEN | ARTIFACT | Test Type | FORBIDDEN | OUTCOME |
|---|---|---|---|---|---|---|
| S1 | User searches "security compliance" AND ≥1 approved answer exists with matching category | searchAnswers() returns array where length >= 1 AND every result.score >= 0.8 AND result.category === "security" | apps/crm/tests/story-s1-answer-library.spec.ts | integration | Empty array when seeded answers exist — query not wired to DB | Answer Library surfaces past bid answers with confidence scores |
Build Contract Schema
FAVV v2.1 — the parser reads this table. Column names are the interface contract. Change them and the parser breaks.
| Column | Engineering Reads As |
|---|---|
| FeatureID | RaaS catalog ID — links to feature-matrix row |
| Feature | Name of the capability |
| Function | Verb-led behavior. "Browse, search, filter contacts" = 3 tests. "Contacts" = BLOCKER |
| Artifact | Concrete deliverable: TypeScript client, PostgreSQL migration, React component |
| Success Test | Happy-path criteria referencing Story IDs. This IS the e2e test spec. |
| Safety Test | What must NOT happen — from FORBIDDEN in Story Contract |
| Regression Test | What existing capability must NOT degrade — names capability + threshold |
| Value | Business outcome in one sentence |
| State | Enum only: Live · Built · Dormant · Partial · Not verified · Gap · Stub · Broken |
Scope rule: Max 5 distinct FeatureIDs per PRD. Scope is frozen at REGISTER time. Adding a new FeatureID after registration requires a new PRD.
Checklist
Before engineering receives the spec:
Intent contract
- 2–4 outcomes, each observable without trusting agent self-report
- Hard constraints enforceable outside prompts (code, middleware, policy engine)
- Autonomy matrix: Allowed / Escalate / Never
- Stop rules: Complete when... / Halt & escalate when...
- Every outcome has artifact + verification method
Story contract
- WHEN names trigger AND precondition data state
- THEN names data source, field, and threshold — not "command exits 0"
- ARTIFACT is exact file path — no TBD, emit BLOCKER if unknown
- FORBIDDEN names the counterfeit success, not just "should not fail"
- OUTCOME includes performance number (unit + threshold + comparison) — not "works"
- Each story declares Tight Five dimension (Principles/Performance/Platform/Protocols/Players)
- Tight Five coverage check: every P the PRD touches has ≥1 story
- ≥1 story per user role per critical flow
Build contract
- ≤5 distinct FeatureIDs — all from RaaS catalog or proprietary register
- Function column uses verbs — parser cannot derive tests from nouns
- Success Test references Story IDs
- Safety Test references FORBIDDEN outcomes from Story Contract
- All State values from exact enum
Registration
- Priority score in frontmatter with evidence per dimension
- Row in priority table at correct position
- All 7 prd-maintenance checks pass
Return signals
After build, compare predicted scores to actual outcomes. See PRD Handoff Protocol for parser detection, bookend gates, plan templates, and return signal definitions.
Context
- PRD Handoff Protocol — Engineering interface: parser detection, bookend gates, plan templates, return signals
- Validate Demand — Demand evidence before PRD work begins
- Prioritisation Algorithm — Scoring rubric and build order
- Commissioning Protocol — L0–L4 maturity model
- Feature Matrix — Commissioning status for all platform features
- RaaS Catalog — FeatureID source of truth
- Pictures Templates — Pre-flight maps that feed the PRD
- Flow Engineering — Stories become maps, maps become types, types become code
Links
- Lenny Rachitsky — PRD Templates — Problem before solution, one page, non-goals section. No AI awareness, no build contract.
- Miqdad Jaffer (OpenAI) — AI PRD Template — Battle-tested AI PRD from someone who shipped at Shopify. Human-only — no agent-facing spec or commissioning model.
- Addy Osmani (Google) — Specs for AI Agents — Names the concept: "Agent Experience (AX)." Six core areas, modular context, boundaries.
- GitHub Spec Kit — Spec-Driven Development — Four-phase gated workflow. Specs as executable artifacts.
- Mountain Goat Software — User Story Template — Why the three-part story template works: trigger, behavior, proof.
- ONES — Given/When/Then Format — Given/When/Then improves clarity and testability across product and QA.
- Intent Engineering — Fusing Intent with Dreamineering — Intent as the top-level spec artifact governing autonomy when instructions run out.
Questions
When an agent runs out of instructions, what governs its next decision — and is that written down?
- What's the difference between a constraint enforced structurally and one enforced with words — and which holds when it matters?
- If every outcome needs an artifact that proves it happened, which current PRDs have outcomes with no proof mechanism?
- When a story's FORBIDDEN column is blank, is that a time shortcut or a belief that the happy path is the only path?
- What breaks first when engineering receives a spec where the Function column contains nouns only?