Skip to main content

PRD Handoff Protocol

How does a specification become a plan that becomes working software?

Dream Team                              Engineering
| |
| 1. Write spec (Intent+Stories+FAVV) |
| 2. Populate SPEC-MAP rows |
| 3. Signal handoff via comms |
| 4. Parse spec → fitness check
| 4b. BOUNCE (if fitness fails) <-----|
| 5. Write failing tests (RED)
| 6. Build: RED → GREEN
| 7. Validate: all specs GREEN
| 8. Commission via browser <----|
| 9. Update feature-matrix |
| --- REVERSE SIGNAL --- |
| 10. Behavior changes → spec-delta
| 11. Update spec, re-commission <----|

The gap between "what to build" and "how to build it" is where most projects fail. This protocol defines the contract: what the dream team writes, what engineering reads, and how the spec parser bridges them.

The Contract

TeamOwnsAuthoritative SourceShared Format
Dream TeamWhat and WhyPRD creation templatespec/index.md (FAVV v2.1)
EngineeringHowPlan templates (task sequence + reuse)Implementation plan with reference back to spec

Each side has one authoritative source. Dream team follows the PRD creation template to produce specs. Engineering follows plan templates to produce implementations. The spec is the shared contract — engineering never builds from messages, verbal agreements, or tier checklists without stories.

Ownership Boundary

The dream team writes stories that state what must be true — not how to make it true. "Admin can toggle permissions per entity per role, change persists, unauthorized user denied" is a truth test. Engineering decides whether that's a server action, an API endpoint, or a database trigger.

Dream Team DecidesEngineering Decides
User stories and acceptance criteriaImplementation architecture
Forbidden outcomes (safety tests)Which layer handles enforcement
FeatureIDs and scopeTemplate selection and plan structure
Kill date and priority orderSprint sequencing and task dependencies
What "done" looks likeHow to get there

Non-negotiable from both sides: monorepo structure, hexagonal layering, generators for boilerplate, tests before implementation. These are the substrate that makes continuous improvement possible — not implementation choices.

PRD Structure

prd-{name}/
index.md <- Decision surface (10s read)
pictures/ <- Pre-flight maps (thinking instruments)
prompt-deck/ <- Sales compression (2min read)
spec/ <- Engineering spec (30min read)
index.md <- Intent Contract + Story Contract + Build Contract

Spec Anatomy

The spec parser reads three contracts from spec/index.md:

Intent Contract

9 dimensions defining agent autonomy boundaries. Not parsed into tasks — used for judgment when instructions run out.

DimensionEngineering Gets
ObjectiveProblem context for judgment calls
OutcomesObservable state changes to verify
Health MetricsWhat must NOT degrade (Goodhart guard)
ConstraintsHard stops vs steering guidance
AutonomyAllowed / Escalate / Never boundaries
Stop RulesWhen to stop building, when to halt
Counter-metricsNumbers that must stay stable while optimizing
Blast RadiusWhat systems, data, or users this touches
RollbackHow to undo if things go wrong

Story Contract

Stories convert user intent into testable scenarios. Each becomes a Given/When/Then.

ColumnEngineering Reads As
IntentionUser-facing state change — what the user experiences, not what the code does
TriggerObservable event — becomes the test's "Given/When"
Observable SuccessBinary or thresholded — verifiable without the builder present
Forbidden OutcomeWhat must NOT happen — feeds Safety Test in Build Contract
Evidence Typeunit / integration / e2e / eval / replay / monitor
EscalationWhen the agent must stop and ask a human

Every story must have at least one Forbidden Outcome. This is the counterfeit progress detector — it catches work that looks done but isn't safe.

Story rows map 1:many to Build Contract rows. One story may need multiple features. One feature may serve multiple stories.

Build Contract

The deliverable. Every row has an acceptance test. Format: FAVV v2.1.

ColumnEngineering Reads As
FeatureIDLinks to feature-matrix row — the RaaS catalog ID
FunctionFeature name + behavior. Verbs generate test cases. "Browse, search, filter contacts" = 3 tests
ArtifactConcrete deliverable: TypeScript client, PostgreSQL migration, React component
Success TestHappy-path acceptance criteria with thresholds — this IS the e2e test spec
Safety TestWhat must NOT happen — populated from Story Contract Forbidden Outcomes
Regression TestWhat existing capability must NOT degrade — names the specific capability + threshold
ValueBusiness outcome in one sentence
StateEnum: Live, Built, Dormant, Partial, Not verified, Gap, Stub, Broken

Function column rule: Use verbs. "Browse, search, filter contacts" generates 3 test cases. "Contacts" generates a BLOCKER — the parser cannot derive tests from nouns.

Job groupings: H3 headings above FAVV table sections group rows by user job. Each heading includes the FeatureIDs that job advances.

Frozen Scope

A PRD's scope is the set of FeatureIDs in its Build Contract at registration time. That set is frozen.

  • Max 5 distinct FeatureIDs per PRD
  • Adding a new FeatureID after registration requires a new PRD
  • The Feature Matrix is the source of truth for which PRD advances which feature

Parser Detection

Header contains...FormatJob source
Safety TestFAVV v2.1H3 heading above table
VerificationFAVV v2.0H3 heading above table
FeatureFFO (legacy)Job column in table

Search order: spec/index.md first, then spec/protocols/index.md fallback. New PRDs use FAVV v2.1. Old PRDs migrate when touched.

Bookend Gates

Every plan has two bookends. They are not optional.

BookendWhenOutcome
Start: spec-to-testsBefore phase-1Every FAVV row has a test that fails
End: JTBD validationAfter last implementation phaseEvery spec passes

The bookends ARE the definition of done. The start bookend converts stories into machine-verifiable tests (RED). The end bookend proves they pass (GREEN). Everything between is engineering's domain.

Plans reference the spec via a path pointer to spec/index.md. The plan task generator converts Build Contract rows into plan tasks — merge with template tasks so PRD-specific work is explicit.

Fitness Gate

Between spec parsing (step 4) and test writing (step 5), engineering runs an architecture fitness check. This gate can bounce a spec back to the Dream Team before any building starts.

Gate Checks

CheckQuestionBounce if...
Capability overlapDoes an existing module already deliver this?Feature-matrix shows another PRD at L2+ for the same FeatureIDs
Hex boundaryDoes the spec cross hexagonal layer boundaries?A single FAVV row mixes domain logic with infrastructure or presentation
Generator coverageDoes a generator exist for the pattern this spec describes?Spec asks for hand-coded CRUD when a generator produces it correctly
Sibling collisionDoes an in-flight plan already scaffold the same artifact?Active project list shows overlapping components, routes, or server actions

Dream Pre-Check

Before signaling handoff, verify:

  • Read Feature Matrix — confirm no other PRD at L2+ advances the same FeatureIDs
  • Read the Build Contract Function column — confirm each row stays within one hexagonal layer
  • Check if the pattern has an existing generator (entity CRUD, e2e test, UI component) — if so, the spec should reference it, not describe the pattern from scratch

Bounce Protocol

A bounce is not a rejection. It is a return signal. Three pre-build gates can trigger one:

GateFires when...Signal
Fitness checkSpec overlaps existing capability or crosses hex boundariesspec-bounce — which check failed and what to change
Missing storiesNo Story Contract → no Safety Tests → negative testing is guessworkspec-incomplete — which contracts are missing
Noun-only functionsFunction column has nouns ("Contacts") instead of verbs ("Browse, search, filter contacts")spec-incomplete — parser cannot derive tests

Bounce Response

  1. Read the bounce message — it names the specific failure
  2. Update spec/index.md to address the gap
  3. Re-run the pre-handoff checklist (see Create PRD Stories checklist)
  4. Re-signal handoff via comms to #meta

If the bounce changes scope (new FeatureIDs needed), create a new PRD. Original PRD scope stays frozen.

The return signal in Create PRD Stories documents post-build commissioning signals.

Enforcement Tiers

Push enforcement up. Every decision an agent makes is a chance to get it wrong.

TierMechanismGuaranteeFailure Mode
1GeneratorsCode IS correct by constructionNone — deterministic
2Plan TemplatesPhase order + best practice remindersAgent skips bookend
3RulesArchitecture constraints always in contextAgent ignores under load
4SkillsProcedural memory for complex workflowsAgent forgets to invoke
5Agent MemoryDomain knowledge, judgment callsDrift, hallucination, forgetting

The best code is code you don't write. Generators produce correct code by construction. Plan templates remind engineers to reuse helper functions, shared components, and library patterns before writing new code. Spend Tier 5 tokens on edge cases, not boilerplate.

Plan Templates

Plans have two jobs: define the task sequence AND encode best practices. Each template carries institutional memory — which generators to run, which helper libraries to use, which shared components exist. The plan reminds the agent what to reuse so it doesn't reinvent what the factory already built.

When engineering receives a spec, template selection depends on the work:

PRD ContainsTemplateWhat It Does
New data entitiesdata-crud-flowSchema, repos, actions, verified CRUD
Existing entity bugscorrective-crud-actionTrace root cause through layers, fix, harden
Failing E2E testtest-failure-investigationWalk the pipe, identify correct layer, fix at source
UI verification needede2e-intent-validationPlaywright specs proving user journeys work
Issues to validateissue-validation-sweepBatch validation across entities
Entity at L1 needing L3entity-commissioningSchema exists, commission through to CRUD

Trophy Layer Selection: Every build template must include L1 → L2 → L3 cascade phases. Walk the pipe before writing any L3 test — call the server action directly. If it fails, the fix is at L2, not L3. The E2E Admission Gate prevents wasted E2E effort.

Gate: If the PRD has no Story Contract, engineering flags this before starting. Stories are the source for Safety Tests — without them, negative testing is guesswork.

Failure Modes

FailureSymptomFix
No Story ContractInfrastructure built, admin/user flows missedWrite stories before FAVV rows
Function column uses nounsParser cannot generate test casesUse verbs: "Admin invites user" not "Invitations"
No Forbidden OutcomesSafety Test column emptyEvery story gets at least one
Stories miss a roleAdmin governance UI never builtOne story per role per critical flow
Tier checklist without specImplementation tasks, not acceptance criteriaSpec rows are the contract, tiers are build order hints
FeatureID missingRow not tracked in feature-matrixEvery FAVV row links to a RaaS catalog ID
Template framing too loosePlan tasks vague, agent builds wrong thingTemplate needs tighter phase gates or generator references

SPEC-MAP

The shared traceability artifact. Both sides write to it. Neither side can claim done with empty cells.

A SPEC-MAP lives in the engineering repo's E2E domain folder. It has one row per Story Contract row.

ColumnWritten ByWhen
Story #DreamAt handoff
WHEN/THENDreamAt handoff
Test LayerEngineeringAt spec-to-tests bookend (L1/L2/L3)
Test FileEngineeringAt spec-to-tests bookend
Test StatusEngineeringAt JTBD validation bookend (RED/GREEN)
L-LevelDreamAt commissioning
Last VerifiedDreamAt commissioning

Test Layer selection: L1 for schema/validation logic, L2 for server action wiring (most stories land here), L3 only when the browser is the proof. Apply the E2E Admission Gate — if you can prove the assertion by calling the server action directly, it's L2, not L3.

SPEC-MAP Catches

GapWithout SPEC-MAPWith SPEC-MAP
Feature works but has no testCommissioning scores L4, CI has no regression protectionEmpty Test File cell — visible before commissioning starts
Engineering changes Screen Contract stateDream's spec drifts from realityEngineering updates WHEN/THEN when behavior changes, Dream sees the delta
Story Contract row is untestableEngineering silently skips itTest File cell stays empty — BLOCKER surfaces at validation bookend
Commissioning misses a regressionNext deploy breaks a feature nobody re-checkedLast Verified column shows stale dates — triggers re-commission

SPEC-MAP Rules

  • Every Story Contract row from spec/index.md must appear as a SPEC-MAP row
  • Engineering fills Test Layer (L1/L2/L3) and Test File during the start bookend — if a story can't map to a test, that's a spec-bounce, not a skip
  • Test Status updates automatically from CI (GREEN/RED)
  • Dream fills L-Level and Last Verified during commissioning
  • A capability cannot reach L3 with any empty Test Layer cells in its SPEC-MAP
  • A capability cannot reach L4 with any empty cells in its SPEC-MAP
  • When engineering changes behavior that affects WHEN/THEN, engineering updates the SPEC-MAP row AND posts a spec-delta to #meta

Reverse Signal

When engineering changes implementation in ways that affect the spec:

  1. Engineering updates the SPEC-MAP row with the new behavior
  2. Engineering posts spec-delta via comms to #meta
  3. Message includes: which story row changed, old behavior, new behavior, why
  4. Dream Team updates spec/index.md Story Contract to match reality
  5. Dream re-commissions the affected rows

Without this, specs fossilize. The Screen Contract says "loading skeleton appears" but the component now renders immediately. The spec is wrong. Nobody updates it. The next commissioner reads a spec that describes a product that no longer exists.

Full Sequence

Dream Team                          Engineering
| |
| 1. Write spec/index.md |
| (Intent + Stories + FAVV) |
| |
| 2. Populate SPEC-MAP columns 1-2 |
| (Story #, WHEN/THEN) |
| |
| 3. Signal handoff via comms |
| to #meta |
| |
| 4. Parse spec
| (spec reference set on project)
| |
| 4a. Fitness check
| (overlap, hex boundary,
| generator coverage,
| sibling collision)
| |
| 4b. BOUNCE (if fitness fails) <---|
| spec-bounce via comms |
| Update spec, re-signal |
| |
| 5. Spec-to-tests bookend
| Populate SPEC-MAP column 3
| (Test File paths)
| Failing specs (RED)
| |
| 6. Build: RED → GREEN
| (plans reference spec,
| reuse generators +
| helpers from templates)
| |
| 7. JTBD validation bookend
| All specs GREEN
| Update SPEC-MAP column 4
| (Test Status = GREEN)
| |
| 8. commissioning-update <----|
| posted to #meta |
| (level, entities, evidence) |
| |
| 9. Commission via browser |
| Read SPEC-MAP — verify |
| zero empty cells |
| (dream team validates L4) |
| Update SPEC-MAP columns 5-6 |
| (L-Level, Last Verified) |
| |
| 10. Update feature-matrix |
| (advance L-level) |
| |
| --- REVERSE SIGNAL --- |
| |
| 11. Engineering changes behavior
| Update SPEC-MAP WHEN/THEN
| Post spec-delta to #meta
| |
| 12. Dream updates spec/index.md <--|
| Re-commissions affected rows |

Field Lessons

Patterns discovered during builds. Not rules — observations that survived contact with reality.

Multi-Surface Builds

When a Build Contract touches multiple surfaces (CLI, web, MCP), two implementations diverge unless they share a single application layer.

PatternWhat HappenedStory Nudge
Factory-firstA use-case factory unified four surfaces. Without it, CLI and server actions diverged silentlyIf Build Contract rows touch 2+ surfaces, add FORBIDDEN: "implementation diverges between surfaces"
Schema-firstDefining validation schemas before handlers unlocked describe, MCP, and validation in one passIf Function column includes "parse", "validate", or "accept input", name the schema as the artifact
DelegationServer actions called repositories directly, bypassing the use case layer. Two codepaths, invisible driftEvery "change persists" story should include FORBIDDEN: "mutation bypasses application layer"

Build Discipline

PatternWhat HappenedStory Nudge
Bookend skippedStart bookend (failing tests) was treated as optional under time pressure. Tests written after implementation missed design errorsRED tests are the definition of "ready to build." If they don't exist, building hasn't started
Empty SPEC-MAPFeatures shipped with empty Test File cells. Commissioning passed on visual inspection aloneEmpty cells are blockers. A feature without a test file is a feature without evidence
Orphaned compositionA composition file existed for weeks with zero importers. Nobody noticedIf a Build Contract row creates shared infrastructure, a story should verify something consumes it

Protocol Reminders

Five nudges the protocol expects engineering to surface at key moments. Dream defines WHAT the reminder says and WHEN it appears. Engineering decides HOW.

These are nudges, not blocks. The goal is a friendly prompt in the right direction.

NudgeWhenReminder
SPEC-MAP completenessBefore PR merge"N empty Test File cells. Stories without tests are wishes, not contracts."
Reverse signalTest assertion changes"SPEC-MAP WHEN/THEN still match? If not, post spec-delta to #meta."
Bookend startPlan first task begins"Every FAVV row needs a failing test before Phase 1. Run spec-to-tests."
Factory patternBuild rows touch 2+ surfaces"Multiple surfaces for one capability. Consider a use-case factory to prevent divergence."
Counterfeit bridgeWriting Safety Tests"Safety Test should match Story Contract counterfeit verbatim. If it doesn't, the bridge is broken."

Reminder Tiers

Engineering can implement reminders at any enforcement tier. Higher tiers catch more.

TierMechanismExample
2Plan templateTask description includes the reminder text
3Rule fileAlways-loaded context nudges the agent
4Skill stepProcedural gate checks the condition
5HookAutomated check on commit or PR

The best reminder is unnecessary because the generator produces correct code by construction (Tier 1). Until then, Tier 2-5 reminders reduce the chance of silent drift.

Context

  • PRD Template — Reference implementation of FAVV v2.1 spec
  • Feature Matrix — Commissioning status for all platform features
  • Commissioning Protocol — L0-L4 maturity model
  • PRD Creation — How to spec capabilities, including Return Signals
  • Priorities — Active PRD table (build order)
  • Credibility — Three loops: inner (engineering), story (predictions), market (external validation). Market is the greatest force
  • Development Pipeline — The full PAIN → COMMISSION chain that the SPEC-MAP traces
  • Protocols — This IS a coordination protocol between two repos
  • Verifiable Intent — SPEC-MAP is a non-cryptographic VI chain: spec=L2 intent, build=L3 action, commission=verification
  • Meetings — Handoff is a Decision meeting between Dream and Engineering

Questions

When a Story Contract is missing, does engineering build the wrong thing or build the right thing without safety tests?

  • What's the cost of a Forbidden Outcome the story writer never imagined vs one they wrote but engineering skipped?
  • If the parser can generate tests from FAVV rows automatically, what role does human judgment play in test design?
  • When should engineering push back on a PRD vs start building and flag gaps as they find them?
  • If a SPEC-MAP has zero empty cells but the market loop (Loop 3) returns no signal, is the problem the spec or the product?
  • When engineering posts a spec-delta that changes a Story Contract's WHEN/THEN, who decides whether the new behavior is better — Dream or Engineering?
  • When a field lesson contradicts the original protocol, which wins — the lesson or the rule it amends?