Create PRD Stories

How do you spec a product that never gives the same answer twice?

Traditional PRDs define behavior: "When user clicks X, system does Y." AI PRDs define boundaries and quality distributions: "Given input type X, output should score above Y on dimension Z at least N% of the time."

Traditional PRD	AI PRD
Acceptance criteria: pass/fail	Quality targets: percentage above threshold
Edge cases are bugs	Edge cases are statistical certainties
Test before ship	Evaluate continuously
User stories (documentation)	Story contracts — each row = ≥1 test file, RED before impl, GREEN = value
Spec is for humans	Spec is for humans AND agents
Behavior instructions	Intent contracts — govern autonomy when instructions run out

P2P Stories

Protocol Contract Format: Explain the journey from Pain to Performance with Validated Outcomes.

The arc: who hurts → how much → what number proves it's fixed → which Tight Five dimension this serves. Every requirement in a PRD uses this shape — including the PRD creation process itself.

### Story: <2–6 word verb-noun title>

FeatureID: <id or N/A>
Owner: <Dream Team | Engineering | Agent | Shared>
Tight Five: <Principles | Performance | Platform | Protocols | Players>

Pain: <who hurts, what they do today, measurable cost>
Performance Target: <the number that proves this works — unit, threshold, comparison>

When <specific trigger — one observable condition>
Then <specific system action — one required response>

Artifact created/impacted:

- <named object, file, record, table, or state>

Success is:

- <externally observable result matching Performance Target>

Failure if:

- <missing artifact, invalid state, or wrong value>

Writing Rules

One trigger, one primary action, every success statement externally observable, no adjectives without thresholds, no "should" (use "must"), no "etc.", every dependency named explicitly.

Performance Target rules:

Must include a unit (seconds, dollars, rows, percentage)
Must include a threshold (≤2s, ≥75%, ±$5)
Must include a comparison point (vs manual, vs current, vs baseline)
If you can't state the number, you don't understand the problem yet

Tight Five rules:

Every story declares which P it primarily serves
After all stories are written, check coverage — a PRD touching Platform but with zero Platform stories has a blind spot
The mapping becomes the readiness evidence for scoring

Good vs Bad

Quality	Story	Why
Bad	"When ETL runs, Then data loads successfully"	No pain. No number. No unit. "Successfully" is an adjective without threshold.
Bad	"When user queries tool stack, Then system returns correct verdict"	Whose pain? What's correct? How fast? Passes with a stub.
Good	"When operator queries build-or-buy verdict for a category, Then API returns verdict + coverage percentage in ≤2s with ≤5% divergence from manually audited truth"	Names the actor (operator). Names the pain (manual audit). Names the number (≤2s, ≤5%). Names the comparison (manually audited truth).

The Pipeline

9 phases from observed pain to registered PRD. Each produces a named artifact. Full operational detail in the PRD creation work chart.

COLLECT → CLASSIFY → DEFINE → HIERARCHY → MAP → [PITCH + SCORE] → SCAFFOLD → COMPRESS → REGISTER → VERIFY

Phase	Trigger	Artifact	Gate
COLLECT	Pain observed	`spec/story.md` — 4 inputs present	Actor, workaround, evidence, impact all named
CLASSIFY	Pain captured	Type + written rationale	One type, rejection rationale for adjacent types
DEFINE	Type assigned	`spec/index.md` — Intent Contract draft	SIO complete, 2+ outcomes, 1+ hard constraint
HIERARCHY	Intent defined	Overlap check result	No >50% overlap without declared relationship
MAP	Hierarchy clear	5 picture files	All 5 written, none placeholder
PITCH + SCORE	Maps complete	Frontmatter scores	All 5P scores evidenced, composite calculated
SCAFFOLD	Scores set	4 directories with files	Correct variant, no empty files
COMPRESS	Scaffold done	Edited draft	No line removable without information loss
REGISTER	Compressed	Priority table row	Row at correct position, table renders
VERIFY	Registered	prd-maintenance output	All 7 checks pass

Story Contracts: PRD Creation

Story: Capture Observed Pain

FeatureID: N/A Owner: Dream Team

When a pain point is nominated for a PRD Then the author must record actor, observed pain, current workaround, and evidence source

Artifact created/impacted:

spec/story.md — Observed Pain section

Success is:

All four inputs present and named
A reader can identify who is struggling and why without additional context

Failure if:

Any input is missing
Pain statement describes a solution instead of a problem

Story: Assign PRD Type

FeatureID: N/A Owner: Dream Team

When observed pain is captured Then the PRD must be classified as Platform, Product, Agent, or Instrument with written rationale

Artifact created/impacted:

spec/story.md — Classification field

Success is:

Exactly one type assigned
Rationale explains why adjacent types were rejected

Failure if:

More than one type selected
Type assigned without rationale

Story: Pass Strategic Gate

FeatureID: N/A Owner: Dream Team

When classification is complete Then the PRD must score ≥50 composite on Pain × Demand × Edge × Trend × Conversion, each dimension evidenced

Artifact created/impacted:

index.md frontmatter: priority_pain, priority_demand, priority_edge, priority_trend, priority_conversion, priority_score

Success is:

All 5 scores present with evidence note
Composite ≥50 (bands: ≥500 build now, 200–499 strong, 50–199 promising, <50 park)

Failure if:

Any score lacks evidence
Composite not calculated
PRD advances to SCAFFOLD with composite <50

Story: Define Intent Contract

FeatureID: N/A Owner: Dream Team

When strategic gate passes Then spec/index.md must contain a complete Intent Contract with all 9 dimensions

Artifact created/impacted:

spec/index.md — Intent Contract section

Success is:

2–4 outcomes, each observable without trusting agent self-report
Hard constraints enforceable outside prompts
Decision autonomy matrix has Allowed / Escalate / Never columns
Every outcome has an artifact and a verification method

Failure if:

Any outcome relies on agent self-assessment ("agent handles it well")
Hard constraints exist only in prompts
Stop rules absent

Story: Write Story Contract

FeatureID: N/A Owner: Dream Team

When intent contract is complete Then spec/index.md must contain a Story Contract table with ≥1 story per user role per critical flow

Artifact created/impacted:

spec/index.md — Story Contract table

Success is:

Every story has all 7 columns populated: #, WHEN, THEN, ARTIFACT, Test Type, FORBIDDEN, OUTCOME
THEN names data source, field, and threshold — rejects "outputs valid JSON" or "command exits 0"
ARTIFACT has exact file path — no TBD
Every story has ≥1 Forbidden Outcome naming the counterfeit success

Failure if:

ARTIFACT column blank or "TBD" — emit BLOCKER
THEN does not name data source and threshold
FORBIDDEN is generic ("should not fail")
A user role has no story in a critical flow

Story: Build Feature Table

FeatureID: N/A Owner: Dream Team

When story contract is written Then spec/index.md must contain a Build Contract (FAVV v2.1) with ≤5 distinct FeatureIDs, all from the RaaS catalog or Platform Proprietary register

Artifact created/impacted:

spec/index.md — Build Contract table
feature-matrix.md — FeatureID rows (updated post-commissioning)

Success is:

Every row has all 9 columns: #, FeatureID, Feature, Function, Artifact, Success Test, Safety Test, Value, State
Function column uses verbs — "Browse, search, filter contacts" not "Contacts"
Success Test references Story IDs (S1, S2…)
Safety Test references Forbidden Outcomes from Story Contract
State matches exact enum

Failure if:

More than 5 distinct FeatureIDs — split into multiple PRDs
Function column contains nouns only — parser cannot generate tests
State value not from enum
FeatureID absent from RaaS catalog or proprietary register

Story: Register PRD

FeatureID: N/A Owner: Dream Team

When build contract is finalized Then the PRD must appear in the priority table at the correct position and all prd-maintenance checks must pass

Artifact created/impacted:

src/pages/priorities/index.md — Active/Pipeline/Backlog row
src/pages/feature-matrix.md — FeatureID status rows

Success is:

Row at correct priority order (score descending)
All 7 prd-maintenance checks pass
Every FeatureID in build contract has a feature-matrix row

Failure if:

PRD missing from priority table
Table order does not match priority score
Any prd-maintenance check fails

Intent Contract

The governing document. When an agent runs out of instructions, intent governs. Write this before the feature table.

Dimension	Question	Example
Objective	What problem, and why now?	"Month-end reporting takes 3 days of manual copy-paste across 6 spreadsheets"
Desired Outcomes	Observable state changes proving success (2–4)	"No follow-up ticket within 24h" — not "agent asked good questions"
Health Metrics	What must not degrade?	"Existing deal close rate stays above 28%"
Hard Constraints	What the system blocks regardless of agent intent	Enforceable outside prompts — or they're wishes
Steering	How it should think and trade off	Guidance for judgment calls
Autonomy	What it may do alone / must escalate / must never do	Allowed / Escalate / Never
Stop Rules	Complete when... Halt & escalate when...	"Complete when 3 consecutive evals pass. Halt when error rate exceeds 5%"
Evidence	For each outcome, what artifact proves it happened?	Log entry, scored eval, refusal record, drift report
Verification	How each artifact is checked	Rubric, automated check, human audit

Outcomes rule: "Report generated in <10 min, no manual correction needed" passes. "Agent generates reports quickly" fails — no threshold, no artifact, no time bound.

AI quality contracts belong in Intent Contract outcomes:

For [feature], outputs must score:
- [Dimension A]: ≥ [score] for [percentage]% of requests
- NEVER: [unacceptable outcome] for more than [N]% of requests

Constraint types: Hard constraints (code, middleware, policy engine — outside the prompt) vs Steering constraints (prompt, context, principles — inside the prompt). Hard constraints that live only in prompts are wishes.

Story Contract Schema

Stories are test contracts, not documentation. Each row becomes ≥1 test file. Tests RED = not started. Tests GREEN = value delivered. N stories → ≥N test files.

Column	Rule
#	Story ID (S1, S2…) — referenced in Build Contract Success Test and Safety Test columns
WHEN	Trigger + precondition. Names the data state that must exist first.
THEN	Exact assertion naming data source, field, threshold. Reject: "outputs valid JSON", "command exits 0", "test passes"
ARTIFACT	Exact path to test file engineering must write. Emit BLOCKER if unknown. Never blank.
Test Type	unit / integration / e2e / a2a — most stories need 2 rows (unit + e2e or a2a)
FORBIDDEN	What must NOT be true when story passes. Names the counterfeit success — the stub that satisfies a weak THEN.
OUTCOME	Business value proven when THEN passes. Must include a performance number — not "works", not "loads correctly". Example: "Operator gets build/buy verdict in ≤2s instead of 30min manual audit"

Tight Five Coverage Check

After writing all stories, map them to the Tight Five:

P	Story Coverage Question	Red Flag if Missing
Principles	Does a story prove value transformation is real?	Building without knowing if it matters
Performance	Does a story carry a measurable target with unit + threshold?	Activity without outcomes — the #1 failure mode
Platform	Does a story prove it ships on existing stack?	Dependent on things you don't control
Protocols	Does a story prove the output compounds (reusable, queryable, standard)?	Success that doesn't compound
Players	Does a story prove someone real will use it?	Isolated agency — built for nobody

Not every PRD covers all five. But every gap should be a conscious decision, not an oversight. If a PRD touches Platform but has zero Platform stories, the readiness score for Platform cannot exceed 2.

Example:

#	WHEN	THEN	ARTIFACT	Test Type	FORBIDDEN	OUTCOME
S1	User searches "security compliance" AND ≥1 approved answer exists with matching category	`searchAnswers()` returns array where `length >= 1` AND every `result.score >= 0.8` AND `result.category === "security"`	`apps/crm/tests/story-s1-answer-library.spec.ts`	integration	Empty array when seeded answers exist — query not wired to DB	Answer Library surfaces past bid answers with confidence scores

Build Contract Schema

FAVV v2.1 — the parser reads this table. Column names are the interface contract. Change them and the parser breaks.

Column	Engineering Reads As
FeatureID	RaaS catalog ID — links to feature-matrix row
Feature	Name of the capability
Function	Verb-led behavior. "Browse, search, filter contacts" = 3 tests. "Contacts" = BLOCKER
Artifact	Concrete deliverable: TypeScript client, PostgreSQL migration, React component
Success Test	Happy-path criteria referencing Story IDs. This IS the e2e test spec.
Safety Test	What must NOT happen — from FORBIDDEN in Story Contract
Regression Test	What existing capability must NOT degrade — names capability + threshold
Value	Business outcome in one sentence
State	Enum only: `Live` · `Built` · `Dormant` · `Partial` · `Not verified` · `Gap` · `Stub` · `Broken`

Scope rule: Max 5 distinct FeatureIDs per PRD. Scope is frozen at REGISTER time. Adding a new FeatureID after registration requires a new PRD.

Every Screen Contract must map the user's journey from pain to effortless performance. The UI team needs this to build information architecture that carries the user forward — not just pages that contain features.

The Six Stages

Stage	User State	Navigation Question	Screen Contract Must Answer
Pain	Friction is the norm	What does the user do today that hurts?	Document the current navigation path and its cost (clicks, time, context switches)
Awareness	There's a faster way	How does the user discover the new capability?	Entry points: where in existing navigation does the new surface appear? First-time indicators (badges, pulses, onboarding tooltips)
First Value	That was faster	What's the smallest interaction that proves value?	Empty state design: suggested prompts, pre-loaded context, zero-config first use. Time-to-value target
Habit	I go here first now	How does the new surface become the default path?	Navigation hierarchy: does this become the first item? Does the dashboard change? What signals reinforce the new behavior?
Mastery	I do everything from here	How does the power user stay in flow?	Cross-cutting navigation: deep-links between the new surface and existing sections. Contextual triggers from existing pages back to the new surface
Effortless	It anticipates what I need	How does the system reduce even the need to navigate?	Proactive suggestions, context-aware defaults, ambient awareness of user intent

Applying to Screen Contracts

For every Screen Contract in a PRD, add a Navigation Architecture section that answers:

Current nav path — What sidebar/menu items exist today? What is the user's current click-path to accomplish this job?
Target nav path — Where does the new screen appear in the hierarchy? Above or below existing items? Why?
Feature flag states — What does navigation look like with the flag OFF vs ON? What changes at each build sprint (T0, T1, T2...)?
Cross-cutting routes — A table mapping: From (existing page) → To (new surface) → Trigger (what the user clicks) → What Happens (navigation behavior, context carried forward)
Wiring coordinates — Exact file paths for the sidebar component, layout wrapper, floating widget, and any page that gains a contextual trigger

Why This Matters

The multimodal agent PRD had a complete Screen Contract (route, states, selectors, elements) but zero navigation architecture. The UI team knew WHAT to build but not WHERE it goes, HOW users find it, or HOW it integrates with 6 existing sidebar sections. The Screen Contract described a destination without a map.

A screen without navigation architecture is a room without a door.

Checklist

Before engineering receives the spec:

Intent contract

2–4 outcomes, each observable without trusting agent self-report
Hard constraints enforceable outside prompts (code, middleware, policy engine)
Autonomy matrix: Allowed / Escalate / Never
Stop rules: Complete when... / Halt & escalate when...
Every outcome has artifact + verification method

Story contract

WHEN names trigger AND precondition data state
THEN names data source, field, and threshold — not "command exits 0"
ARTIFACT is exact file path — no TBD, emit BLOCKER if unknown
FORBIDDEN names the counterfeit success, not just "should not fail"
OUTCOME includes performance number (unit + threshold + comparison) — not "works"
Each story declares Tight Five dimension (Principles/Performance/Platform/Protocols/Players)
Tight Five coverage check: every P the PRD touches has ≥1 story
≥1 story per user role per critical flow

Build contract

≤5 distinct FeatureIDs — all from RaaS catalog or proprietary register
Function column uses verbs — parser cannot derive tests from nouns
Success Test references Story IDs
Safety Test references FORBIDDEN outcomes from Story Contract
All State values from exact enum

Navigation architecture (for any PRD with UI screens)

Pain-to-perform journey: all 6 stages documented with navigation implications
Current nav path documented (existing sidebar/menu structure)
Target nav path documented (where new screen appears, why that position)
Feature flag states: nav with flag OFF vs ON at each sprint
Cross-cutting routes table: From → To → Trigger → What Happens
Wiring coordinates: exact file paths for sidebar, layout, widget, contextual triggers

Registration

Priority score in frontmatter with evidence per dimension
Row in priority table at correct position
All 7 prd-maintenance checks pass
No in-flight engineering plan covers the same FeatureIDs (query active projects before handoff)
No sibling PRD in priority table shares >50% of FeatureIDs without a declared relationship

Return Signals

The feedback loop closer. Without return signals, the pipeline runs open — specs produce builds that never correct the spec.

Trigger: Engineering completes a capability and the commissioner runs the L0-L4 protocol. The gap between predicted scores and actual outcomes IS the return signal.

Format: The SPEC-MAP is the shared artifact. For each Story Contract row, the commissioner records: Pass/Fail, actual measurement vs predicted threshold, and evidence (screenshot, GIF, console output). This populates the L-Level and Last Verified columns — one row per story, predicted vs actual side by side.

Receiver: The Dream Team author who wrote the original PRD.

Three responses:

Story passed, threshold met — capability promoted, SPEC-MAP row complete, no spec change
Story failed, build gap — engineering issue logged, fix stream picks it up, SPEC-MAP Test Status stays RED
Story passed but threshold was wrong — spec gap. Update the Story Contract with the real threshold AND update the SPEC-MAP WHEN/THEN columns to match reality. This is the most valuable signal — it improves future specs

The reverse signal: Engineering also triggers return signals. When a build changes behavior that affects a Story Contract row (removing a skeleton gate, changing a redirect, altering a loading state), engineering updates the SPEC-MAP and posts a spec-delta to #meta. Dream Team updates spec/index.md to match. Without this, specs fossilize — they describe yesterday's product while engineering ships tomorrow's.

The return signal feeds back into 5P scoring. A capability that consistently misses Performance targets should have its readiness scores adjusted. See PRD Handoff Protocol for parser detection, bookend gates, plan templates, and the SPEC-MAP contract.

Context

PRD Handoff Protocol — Engineering interface: parser detection, bookend gates, plan templates, return signals
Validate Demand — Demand evidence before PRD work begins
Prioritisation Algorithm — Scoring rubric and build order
Commissioning Protocol — L0–L4 maturity model
Feature Matrix — Commissioning status for all platform features
RaaS Catalog — FeatureID source of truth
Pictures Templates — Pre-flight maps that feed the PRD
Flow Engineering — Stories become maps, maps become types, types become code

Questions

When an agent runs out of instructions, what governs its next decision — and is that written down?

What's the difference between a constraint enforced structurally and one enforced with words — and which holds when it matters?
If every outcome needs an artifact that proves it happened, which current PRDs have outcomes with no proof mechanism?
When a story's FORBIDDEN column is blank, is that a time shortcut or a belief that the happy path is the only path?
What breaks first when engineering receives a spec where the Function column contains nouns only?

Create PRD Stories

P2P Stories

Writing Rules

Good vs Bad

The Pipeline

Story Contracts: PRD Creation

Story: Capture Observed Pain

Story: Assign PRD Type

Story: Pass Strategic Gate

Story: Define Intent Contract

Story: Write Story Contract

Story: Build Feature Table

Story: Register PRD

Intent Contract

Story Contract Schema

Tight Five Coverage Check

Build Contract Schema

Navigation Journey Checklist

The Six Stages

Applying to Screen Contracts

Why This Matters

Checklist

Return Signals

Context

Links

Questions

P2P Stories​

Writing Rules​

Good vs Bad​

The Pipeline​

Story Contracts: PRD Creation​

Story: Capture Observed Pain​

Story: Assign PRD Type​

Story: Pass Strategic Gate​

Story: Define Intent Contract​

Story: Write Story Contract​

Story: Build Feature Table​

Story: Register PRD​

Intent Contract​

Story Contract Schema​

Tight Five Coverage Check​

Build Contract Schema​

Navigation Journey Checklist​

The Six Stages​

Applying to Screen Contracts​

Why This Matters​

Checklist​

Return Signals​

Context​

Links​

Questions​

P2P Stories

Writing Rules

Good vs Bad

The Pipeline

Story Contracts: PRD Creation

Story: Capture Observed Pain

Story: Assign PRD Type

Story: Pass Strategic Gate

Story: Define Intent Contract

Story: Write Story Contract

Story: Build Feature Table

Story: Register PRD

Intent Contract

Story Contract Schema

Tight Five Coverage Check

Build Contract Schema

Navigation Journey Checklist

The Six Stages

Applying to Screen Contracts

Why This Matters

Checklist

Return Signals

Context

Links

Questions