Skip to main content

Create PRD Stories

How do you spec a product that never gives the same answer twice?

Traditional PRDs define behavior: "When user clicks X, system does Y." AI PRDs define boundaries and quality distributions: "Given input type X, output should score above Y on dimension Z at least N% of the time."

Traditional PRDAI PRD
Acceptance criteria: pass/failQuality targets: percentage above threshold
Edge cases are bugsEdge cases are statistical certainties
Test before shipEvaluate continuously
User stories (documentation)Story contracts — each row = ≥1 test file, RED before impl, GREEN = value
Spec is for humansSpec is for humans AND agents
Behavior instructionsIntent contracts — govern autonomy when instructions run out

P2P Stories

Protocol Contract Format: Explain the journey from Pain to Performance with Validated Outcomes.

The arc: who hurts → how much → what number proves it's fixed → which Tight Five dimension this serves. Every requirement in a PRD uses this shape — including the PRD creation process itself.

### Story: <2–6 word verb-noun title>

FeatureID: <id or N/A>
Owner: <Dream Team | Engineering | Agent | Shared>
Tight Five: <Principles | Performance | Platform | Protocols | Players>

Pain: <who hurts, what they do today, measurable cost>
Performance Target: <the number that proves this works unit, threshold, comparison>

When <specific trigger one observable condition>
Then <specific system action one required response>

Artifact created/impacted:

- <named object, file, record, table, or state>

Success is:

- <externally observable result matching Performance Target>

Failure if:

- <missing artifact, invalid state, or wrong value>

Writing Rules

One trigger, one primary action, every success statement externally observable, no adjectives without thresholds, no "should" (use "must"), no "etc.", every dependency named explicitly.

Performance Target rules:

  • Must include a unit (seconds, dollars, rows, percentage)
  • Must include a threshold (≤2s, ≥75%, ±$5)
  • Must include a comparison point (vs manual, vs current, vs baseline)
  • If you can't state the number, you don't understand the problem yet

Tight Five rules:

  • Every story declares which P it primarily serves
  • After all stories are written, check coverage — a PRD touching Platform but with zero Platform stories has a blind spot
  • The mapping becomes the readiness evidence for scoring

Good vs Bad

QualityStoryWhy
Bad"When ETL runs, Then data loads successfully"No pain. No number. No unit. "Successfully" is an adjective without threshold.
Bad"When user queries tool stack, Then system returns correct verdict"Whose pain? What's correct? How fast? Passes with a stub.
Good"When operator queries build-or-buy verdict for a category, Then API returns verdict + coverage percentage in ≤2s with ≤5% divergence from manually audited truth"Names the actor (operator). Names the pain (manual audit). Names the number (≤2s, ≤5%). Names the comparison (manually audited truth).

The Pipeline

9 phases from observed pain to registered PRD. Each produces a named artifact. Full operational detail: .ai/plans/create-prd/CLAUDE.md.

COLLECT → CLASSIFY → DEFINE → HIERARCHY → MAP → [PITCH + SCORE] → SCAFFOLD → COMPRESS → REGISTER → VERIFY
PhaseTriggerArtifactGate
COLLECTPain observedspec/story.md — 4 inputs presentActor, workaround, evidence, impact all named
CLASSIFYPain capturedType + written rationaleOne type, rejection rationale for adjacent types
DEFINEType assignedspec/index.md — Intent Contract draftSIO complete, 2+ outcomes, 1+ hard constraint
HIERARCHYIntent definedOverlap check resultNo >50% overlap without declared relationship
MAPHierarchy clear5 picture filesAll 5 written, none placeholder
PITCH + SCOREMaps completeFrontmatter scoresAll 5P scores evidenced, composite calculated
SCAFFOLDScores set4 directories with filesCorrect variant, no empty files
COMPRESSScaffold doneEdited draftNo line removable without information loss
REGISTERCompressedPriority table rowRow at correct position, table renders
VERIFYRegisteredprd-maintenance outputAll 7 checks pass

Story Contracts: PRD Creation

Story: Capture Observed Pain

FeatureID: N/A Owner: Dream Team

When a pain point is nominated for a PRD Then the author must record actor, observed pain, current workaround, and evidence source

Artifact created/impacted:

  • spec/story.md — Observed Pain section

Success is:

  • All four inputs present and named
  • A reader can identify who is struggling and why without additional context

Failure if:

  • Any input is missing
  • Pain statement describes a solution instead of a problem

Story: Assign PRD Type

FeatureID: N/A Owner: Dream Team

When observed pain is captured Then the PRD must be classified as Platform, Product, Agent, or Instrument with written rationale

Artifact created/impacted:

  • spec/story.md — Classification field

Success is:

  • Exactly one type assigned
  • Rationale explains why adjacent types were rejected

Failure if:

  • More than one type selected
  • Type assigned without rationale

Story: Pass Strategic Gate

FeatureID: N/A Owner: Dream Team

When classification is complete Then the PRD must score ≥50 composite on Pain × Demand × Edge × Trend × Conversion, each dimension evidenced

Artifact created/impacted:

  • index.md frontmatter: priority_pain, priority_demand, priority_edge, priority_trend, priority_conversion, priority_score

Success is:

  • All 5 scores present with evidence note
  • Composite ≥50 (bands: ≥500 build now, 200–499 strong, 50–199 promising, <50 park)

Failure if:

  • Any score lacks evidence
  • Composite not calculated
  • PRD advances to SCAFFOLD with composite <50

Story: Define Intent Contract

FeatureID: N/A Owner: Dream Team

When strategic gate passes Then spec/index.md must contain a complete Intent Contract with all 9 dimensions

Artifact created/impacted:

  • spec/index.md — Intent Contract section

Success is:

  • 2–4 outcomes, each observable without trusting agent self-report
  • Hard constraints enforceable outside prompts
  • Decision autonomy matrix has Allowed / Escalate / Never columns
  • Every outcome has an artifact and a verification method

Failure if:

  • Any outcome relies on agent self-assessment ("agent handles it well")
  • Hard constraints exist only in prompts
  • Stop rules absent

Story: Write Story Contract

FeatureID: N/A Owner: Dream Team

When intent contract is complete Then spec/index.md must contain a Story Contract table with ≥1 story per user role per critical flow

Artifact created/impacted:

  • spec/index.md — Story Contract table

Success is:

  • Every story has all 7 columns populated: #, WHEN, THEN, ARTIFACT, Test Type, FORBIDDEN, OUTCOME
  • THEN names data source, field, and threshold — rejects "outputs valid JSON" or "command exits 0"
  • ARTIFACT has exact file path — no TBD
  • Every story has ≥1 Forbidden Outcome naming the counterfeit success

Failure if:

  • ARTIFACT column blank or "TBD" — emit BLOCKER
  • THEN does not name data source and threshold
  • FORBIDDEN is generic ("should not fail")
  • A user role has no story in a critical flow

Story: Build Feature Table

FeatureID: N/A Owner: Dream Team

When story contract is written Then spec/index.md must contain a Build Contract (FAVV v2.1) with ≤5 distinct FeatureIDs, all from the RaaS catalog or Platform Proprietary register

Artifact created/impacted:

  • spec/index.md — Build Contract table
  • feature-matrix.md — FeatureID rows (updated post-commissioning)

Success is:

  • Every row has all 9 columns: #, FeatureID, Feature, Function, Artifact, Success Test, Safety Test, Value, State
  • Function column uses verbs — "Browse, search, filter contacts" not "Contacts"
  • Success Test references Story IDs (S1, S2…)
  • Safety Test references Forbidden Outcomes from Story Contract
  • State matches exact enum

Failure if:

  • More than 5 distinct FeatureIDs — split into multiple PRDs
  • Function column contains nouns only — parser cannot generate tests
  • State value not from enum
  • FeatureID absent from RaaS catalog or proprietary register

Story: Register PRD

FeatureID: N/A Owner: Dream Team

When build contract is finalized Then the PRD must appear in the priority table at the correct position and all prd-maintenance checks must pass

Artifact created/impacted:

  • src/pages/priorities/index.md — Active/Pipeline/Backlog row
  • src/pages/feature-matrix.md — FeatureID status rows

Success is:

  • Row at correct priority order (score descending)
  • All 7 prd-maintenance checks pass
  • Every FeatureID in build contract has a feature-matrix row

Failure if:

  • PRD missing from priority table
  • Table order does not match priority score
  • Any prd-maintenance check fails

Intent Contract

The governing document. When an agent runs out of instructions, intent governs. Write this before the feature table.

DimensionQuestionExample
ObjectiveWhat problem, and why now?"Month-end reporting takes 3 days of manual copy-paste across 6 spreadsheets"
Desired OutcomesObservable state changes proving success (2–4)"No follow-up ticket within 24h" — not "agent asked good questions"
Health MetricsWhat must not degrade?"Existing deal close rate stays above 28%"
Hard ConstraintsWhat the system blocks regardless of agent intentEnforceable outside prompts — or they're wishes
SteeringHow it should think and trade offGuidance for judgment calls
AutonomyWhat it may do alone / must escalate / must never doAllowed / Escalate / Never
Stop RulesComplete when... Halt & escalate when..."Complete when 3 consecutive evals pass. Halt when error rate exceeds 5%"
EvidenceFor each outcome, what artifact proves it happened?Log entry, scored eval, refusal record, drift report
VerificationHow each artifact is checkedRubric, automated check, human audit

Outcomes rule: "Report generated in <10 min, no manual correction needed" passes. "Agent generates reports quickly" fails — no threshold, no artifact, no time bound.

AI quality contracts belong in Intent Contract outcomes:

For [feature], outputs must score:
- [Dimension A]: ≥ [score] for [percentage]% of requests
- NEVER: [unacceptable outcome] for more than [N]% of requests

Constraint types: Hard constraints (code, middleware, policy engine — outside the prompt) vs Steering constraints (prompt, context, principles — inside the prompt). Hard constraints that live only in prompts are wishes.


Story Contract Schema

Stories are test contracts, not documentation. Each row becomes ≥1 test file. Tests RED = not started. Tests GREEN = value delivered. N stories → ≥N test files.

ColumnRule
#Story ID (S1, S2…) — referenced in Build Contract Success Test and Safety Test columns
WHENTrigger + precondition. Names the data state that must exist first.
THENExact assertion naming data source, field, threshold. Reject: "outputs valid JSON", "command exits 0", "test passes"
ARTIFACTExact path to test file engineering must write. Emit BLOCKER if unknown. Never blank.
Test Typeunit / integration / e2e / a2a — most stories need 2 rows (unit + e2e or a2a)
FORBIDDENWhat must NOT be true when story passes. Names the counterfeit success — the stub that satisfies a weak THEN.
OUTCOMEBusiness value proven when THEN passes. Must include a performance number — not "works", not "loads correctly". Example: "Operator gets build/buy verdict in ≤2s instead of 30min manual audit"

Tight Five Coverage Check

After writing all stories, map them to the Tight Five:

PStory Coverage QuestionRed Flag if Missing
PrinciplesDoes a story prove value transformation is real?Building without knowing if it matters
PerformanceDoes a story carry a measurable target with unit + threshold?Activity without outcomes — the #1 failure mode
PlatformDoes a story prove it ships on existing stack?Dependent on things you don't control
ProtocolsDoes a story prove the output compounds (reusable, queryable, standard)?Success that doesn't compound
PlayersDoes a story prove someone real will use it?Isolated agency — built for nobody

Not every PRD covers all five. But every gap should be a conscious decision, not an oversight. If a PRD touches Platform but has zero Platform stories, the readiness score for Platform cannot exceed 2.

Example:

#WHENTHENARTIFACTTest TypeFORBIDDENOUTCOME
S1User searches "security compliance" AND ≥1 approved answer exists with matching categorysearchAnswers() returns array where length >= 1 AND every result.score >= 0.8 AND result.category === "security"apps/crm/tests/story-s1-answer-library.spec.tsintegrationEmpty array when seeded answers exist — query not wired to DBAnswer Library surfaces past bid answers with confidence scores

Build Contract Schema

FAVV v2.1 — the parser reads this table. Column names are the interface contract. Change them and the parser breaks.

ColumnEngineering Reads As
FeatureIDRaaS catalog ID — links to feature-matrix row
FeatureName of the capability
FunctionVerb-led behavior. "Browse, search, filter contacts" = 3 tests. "Contacts" = BLOCKER
ArtifactConcrete deliverable: TypeScript client, PostgreSQL migration, React component
Success TestHappy-path criteria referencing Story IDs. This IS the e2e test spec.
Safety TestWhat must NOT happen — from FORBIDDEN in Story Contract
Regression TestWhat existing capability must NOT degrade — names capability + threshold
ValueBusiness outcome in one sentence
StateEnum only: Live · Built · Dormant · Partial · Not verified · Gap · Stub · Broken

Scope rule: Max 5 distinct FeatureIDs per PRD. Scope is frozen at REGISTER time. Adding a new FeatureID after registration requires a new PRD.


Checklist

Before engineering receives the spec:

Intent contract

  • 2–4 outcomes, each observable without trusting agent self-report
  • Hard constraints enforceable outside prompts (code, middleware, policy engine)
  • Autonomy matrix: Allowed / Escalate / Never
  • Stop rules: Complete when... / Halt & escalate when...
  • Every outcome has artifact + verification method

Story contract

  • WHEN names trigger AND precondition data state
  • THEN names data source, field, and threshold — not "command exits 0"
  • ARTIFACT is exact file path — no TBD, emit BLOCKER if unknown
  • FORBIDDEN names the counterfeit success, not just "should not fail"
  • OUTCOME includes performance number (unit + threshold + comparison) — not "works"
  • Each story declares Tight Five dimension (Principles/Performance/Platform/Protocols/Players)
  • Tight Five coverage check: every P the PRD touches has ≥1 story
  • ≥1 story per user role per critical flow

Build contract

  • ≤5 distinct FeatureIDs — all from RaaS catalog or proprietary register
  • Function column uses verbs — parser cannot derive tests from nouns
  • Success Test references Story IDs
  • Safety Test references FORBIDDEN outcomes from Story Contract
  • All State values from exact enum

Registration

  • Priority score in frontmatter with evidence per dimension
  • Row in priority table at correct position
  • All 7 prd-maintenance checks pass

Return signals

After build, compare predicted scores to actual outcomes. See PRD Handoff Protocol for parser detection, bookend gates, plan templates, and return signal definitions.


Context

Questions

When an agent runs out of instructions, what governs its next decision — and is that written down?

  • What's the difference between a constraint enforced structurally and one enforced with words — and which holds when it matters?
  • If every outcome needs an artifact that proves it happened, which current PRDs have outcomes with no proof mechanism?
  • When a story's FORBIDDEN column is blank, is that a time shortcut or a belief that the happy path is the only path?
  • What breaks first when engineering receives a spec where the Function column contains nouns only?