PRD Handoff Protocol
How does a specification become a plan that becomes working software?
Dream Team Engineering
| |
| 1. Write spec (Intent+Stories+FAVV) |
| 2. Populate SPEC-MAP rows |
| 3. Signal handoff via comms |
| 4. Parse spec → fitness check
| 4b. BOUNCE (if fitness fails) <-----|
| 5. Write failing tests (RED)
| 6. Build: RED → GREEN
| 7. Validate: all specs GREEN
| 8. Commission via browser <----|
| 9. Update feature-matrix |
| --- REVERSE SIGNAL --- |
| 10. Behavior changes → spec-delta
| 11. Update spec, re-commission <----|
The gap between "what to build" and "how to build it" is where most projects fail. This protocol defines the contract: what the dream team writes, what engineering reads, and how the spec parser bridges them.
The Contract
| Team | Owns | Authoritative Source | Shared Format |
|---|---|---|---|
| Dream Team | What and Why | PRD creation template | spec/index.md (FAVV v2.1) |
| Engineering | How | Plan templates (task sequence + reuse) | Implementation plan with reference back to spec |
Each side has one authoritative source. Dream team follows the PRD creation template to produce specs. Engineering follows plan templates to produce implementations. The spec is the shared contract — engineering never builds from messages, verbal agreements, or tier checklists without stories.
Ownership Boundary
The dream team writes stories that state what must be true — not how to make it true. "Admin can toggle permissions per entity per role, change persists, unauthorized user denied" is a truth test. Engineering decides whether that's a server action, an API endpoint, or a database trigger.
| Dream Team Decides | Engineering Decides |
|---|---|
| User stories and acceptance criteria | Implementation architecture |
| Forbidden outcomes (safety tests) | Which layer handles enforcement |
| FeatureIDs and scope | Template selection and plan structure |
| Kill date and priority order | Sprint sequencing and task dependencies |
| What "done" looks like | How to get there |
Non-negotiable from both sides: monorepo structure, hexagonal layering, generators for boilerplate, tests before implementation. These are the substrate that makes continuous improvement possible — not implementation choices.
PRD Structure
prd-{name}/
index.md <- Decision surface (10s read)
pictures/ <- Pre-flight maps (thinking instruments)
prompt-deck/ <- Sales compression (2min read)
spec/ <- Engineering spec (30min read)
index.md <- Intent Contract + Story Contract + Build Contract
Spec Anatomy
The spec parser reads three contracts from spec/index.md:
Intent Contract
9 dimensions defining agent autonomy boundaries. Not parsed into tasks — used for judgment when instructions run out.
| Dimension | Engineering Gets |
|---|---|
| Objective | Problem context for judgment calls |
| Outcomes | Observable state changes to verify |
| Health Metrics | What must NOT degrade (Goodhart guard) |
| Constraints | Hard stops vs steering guidance |
| Autonomy | Allowed / Escalate / Never boundaries |
| Stop Rules | When to stop building, when to halt |
| Counter-metrics | Numbers that must stay stable while optimizing |
| Blast Radius | What systems, data, or users this touches |
| Rollback | How to undo if things go wrong |
Story Contract
Stories convert user intent into testable scenarios. Each becomes a Given/When/Then.
| Column | Engineering Reads As |
|---|---|
| Intention | User-facing state change — what the user experiences, not what the code does |
| Trigger | Observable event — becomes the test's "Given/When" |
| Observable Success | Binary or thresholded — verifiable without the builder present |
| Forbidden Outcome | What must NOT happen — feeds Safety Test in Build Contract |
| Evidence Type | unit / integration / e2e / eval / replay / monitor |
| Escalation | When the agent must stop and ask a human |
Every story must have at least one Forbidden Outcome. This is the counterfeit progress detector — it catches work that looks done but isn't safe.
Story rows map 1:many to Build Contract rows. One story may need multiple features. One feature may serve multiple stories.
Build Contract
The deliverable. Every row has an acceptance test. Format: FAVV v2.1.
| Column | Engineering Reads As |
|---|---|
| FeatureID | Links to feature-matrix row — the RaaS catalog ID |
| Function | Feature name + behavior. Verbs generate test cases. "Browse, search, filter contacts" = 3 tests |
| Artifact | Concrete deliverable: TypeScript client, PostgreSQL migration, React component |
| Success Test | Happy-path acceptance criteria with thresholds — this IS the e2e test spec |
| Safety Test | What must NOT happen — populated from Story Contract Forbidden Outcomes |
| Regression Test | What existing capability must NOT degrade — names the specific capability + threshold |
| Value | Business outcome in one sentence |
| State | Enum: Live, Built, Dormant, Partial, Not verified, Gap, Stub, Broken |
Function column rule: Use verbs. "Browse, search, filter contacts" generates 3 test cases. "Contacts" generates a BLOCKER — the parser cannot derive tests from nouns.
Job groupings: H3 headings above FAVV table sections group rows by user job. Each heading includes the FeatureIDs that job advances.
Frozen Scope
A PRD's scope is the set of FeatureIDs in its Build Contract at registration time. That set is frozen.
- Max 5 distinct FeatureIDs per PRD
- Adding a new FeatureID after registration requires a new PRD
- The Feature Matrix is the source of truth for which PRD advances which feature
Parser Detection
| Header contains... | Format | Job source |
|---|---|---|
Safety Test | FAVV v2.1 | H3 heading above table |
Verification | FAVV v2.0 | H3 heading above table |
Feature | FFO (legacy) | Job column in table |
Search order: spec/index.md first, then spec/protocols/index.md fallback. New PRDs use FAVV v2.1. Old PRDs migrate when touched.
Bookend Gates
Every plan has two bookends. They are not optional.
| Bookend | When | Outcome |
|---|---|---|
| Start: spec-to-tests | Before phase-1 | Every FAVV row has a test that fails |
| End: JTBD validation | After last implementation phase | Every spec passes |
The bookends ARE the definition of done. The start bookend converts stories into machine-verifiable tests (RED). The end bookend proves they pass (GREEN). Everything between is engineering's domain.
Plans reference the spec via a path pointer to spec/index.md. The plan task generator converts Build Contract rows into plan tasks — merge with template tasks so PRD-specific work is explicit.
Fitness Gate
Between spec parsing (step 4) and test writing (step 5), engineering runs an architecture fitness check. This gate can bounce a spec back to the Dream Team before any building starts.
Gate Checks
| Check | Question | Bounce if... |
|---|---|---|
| Capability overlap | Does an existing module already deliver this? | Feature-matrix shows another PRD at L2+ for the same FeatureIDs |
| Hex boundary | Does the spec cross hexagonal layer boundaries? | A single FAVV row mixes domain logic with infrastructure or presentation |
| Generator coverage | Does a generator exist for the pattern this spec describes? | Spec asks for hand-coded CRUD when a generator produces it correctly |
| Sibling collision | Does an in-flight plan already scaffold the same artifact? | Active project list shows overlapping components, routes, or server actions |
Dream Pre-Check
Before signaling handoff, verify:
- Read Feature Matrix — confirm no other PRD at L2+ advances the same FeatureIDs
- Read the Build Contract Function column — confirm each row stays within one hexagonal layer
- Check if the pattern has an existing generator (entity CRUD, e2e test, UI component) — if so, the spec should reference it, not describe the pattern from scratch
Bounce Protocol
A bounce is not a rejection. It is a return signal. Three pre-build gates can trigger one:
| Gate | Fires when... | Signal |
|---|---|---|
| Fitness check | Spec overlaps existing capability or crosses hex boundaries | spec-bounce — which check failed and what to change |
| Missing stories | No Story Contract → no Safety Tests → negative testing is guesswork | spec-incomplete — which contracts are missing |
| Noun-only functions | Function column has nouns ("Contacts") instead of verbs ("Browse, search, filter contacts") | spec-incomplete — parser cannot derive tests |
Bounce Response
- Read the bounce message — it names the specific failure
- Update
spec/index.mdto address the gap - Re-run the pre-handoff checklist (see Create PRD Stories checklist)
- Re-signal handoff via comms to
#meta
If the bounce changes scope (new FeatureIDs needed), create a new PRD. Original PRD scope stays frozen.
The return signal in Create PRD Stories documents post-build commissioning signals.
Enforcement Tiers
Push enforcement up. Every decision an agent makes is a chance to get it wrong.
| Tier | Mechanism | Guarantee | Failure Mode |
|---|---|---|---|
| 1 | Generators | Code IS correct by construction | None — deterministic |
| 2 | Plan Templates | Phase order + best practice reminders | Agent skips bookend |
| 3 | Rules | Architecture constraints always in context | Agent ignores under load |
| 4 | Skills | Procedural memory for complex workflows | Agent forgets to invoke |
| 5 | Agent Memory | Domain knowledge, judgment calls | Drift, hallucination, forgetting |
The best code is code you don't write. Generators produce correct code by construction. Plan templates remind engineers to reuse helper functions, shared components, and library patterns before writing new code. Spend Tier 5 tokens on edge cases, not boilerplate.
Plan Templates
Plans have two jobs: define the task sequence AND encode best practices. Each template carries institutional memory — which generators to run, which helper libraries to use, which shared components exist. The plan reminds the agent what to reuse so it doesn't reinvent what the factory already built.
When engineering receives a spec, template selection depends on the work:
| PRD Contains | Template | What It Does |
|---|---|---|
| New data entities | data-crud-flow | Schema, repos, actions, verified CRUD |
| Existing entity bugs | corrective-crud-action | Trace root cause through layers, fix, harden |
| UI verification needed | e2e-intent-validation | Playwright specs proving user journeys work |
| Issues to validate | issue-validation-sweep | Batch validation across entities |
| Entity at L1 needing L3 | entity-commissioning | Schema exists, commission through to CRUD |
Gate: If the PRD has no Story Contract, engineering flags this before starting. Stories are the source for Safety Tests — without them, negative testing is guesswork.
Failure Modes
| Failure | Symptom | Fix |
|---|---|---|
| No Story Contract | Infrastructure built, admin/user flows missed | Write stories before FAVV rows |
| Function column uses nouns | Parser cannot generate test cases | Use verbs: "Admin invites user" not "Invitations" |
| No Forbidden Outcomes | Safety Test column empty | Every story gets at least one |
| Stories miss a role | Admin governance UI never built | One story per role per critical flow |
| Tier checklist without spec | Implementation tasks, not acceptance criteria | Spec rows are the contract, tiers are build order hints |
| FeatureID missing | Row not tracked in feature-matrix | Every FAVV row links to a RaaS catalog ID |
| Template framing too loose | Plan tasks vague, agent builds wrong thing | Template needs tighter phase gates or generator references |
SPEC-MAP
The shared traceability artifact. Both sides write to it. Neither side can claim done with empty cells.
A SPEC-MAP lives in the engineering repo's E2E domain folder. It has one row per Story Contract row.
| Column | Written By | When |
|---|---|---|
| Story # | Dream | At handoff |
| WHEN/THEN | Dream | At handoff |
| Test File | Engineering | At spec-to-tests bookend |
| Test Status | Engineering | At JTBD validation bookend (RED/GREEN) |
| L-Level | Dream | At commissioning |
| Last Verified | Dream | At commissioning |
SPEC-MAP Catches
| Gap | Without SPEC-MAP | With SPEC-MAP |
|---|---|---|
| Feature works but has no test | Commissioning scores L4, CI has no regression protection | Empty Test File cell — visible before commissioning starts |
| Engineering changes Screen Contract state | Dream's spec drifts from reality | Engineering updates WHEN/THEN when behavior changes, Dream sees the delta |
| Story Contract row is untestable | Engineering silently skips it | Test File cell stays empty — BLOCKER surfaces at validation bookend |
| Commissioning misses a regression | Next deploy breaks a feature nobody re-checked | Last Verified column shows stale dates — triggers re-commission |
SPEC-MAP Rules
- Every Story Contract row from
spec/index.mdmust appear as a SPEC-MAP row - Engineering fills Test File during the start bookend — if a story can't map to a test, that's a
spec-bounce, not a skip - Test Status updates automatically from CI (GREEN/RED)
- Dream fills L-Level and Last Verified during commissioning
- A capability cannot reach L4 with any empty cells in its SPEC-MAP
- When engineering changes behavior that affects WHEN/THEN, engineering updates the SPEC-MAP row AND posts a
spec-deltato#meta
Reverse Signal
When engineering changes implementation in ways that affect the spec:
- Engineering updates the SPEC-MAP row with the new behavior
- Engineering posts
spec-deltavia comms to#meta - Message includes: which story row changed, old behavior, new behavior, why
- Dream Team updates
spec/index.mdStory Contract to match reality - Dream re-commissions the affected rows
Without this, specs fossilize. The Screen Contract says "loading skeleton appears" but the component now renders immediately. The spec is wrong. Nobody updates it. The next commissioner reads a spec that describes a product that no longer exists.
Full Sequence
Dream Team Engineering
| |
| 1. Write spec/index.md |
| (Intent + Stories + FAVV) |
| |
| 2. Populate SPEC-MAP columns 1-2 |
| (Story #, WHEN/THEN) |
| |
| 3. Signal handoff via comms |
| to #meta |
| |
| 4. Parse spec
| (spec reference set on project)
| |
| 4a. Fitness check
| (overlap, hex boundary,
| generator coverage,
| sibling collision)
| |
| 4b. BOUNCE (if fitness fails) <---|
| spec-bounce via comms |
| Update spec, re-signal |
| |
| 5. Spec-to-tests bookend
| Populate SPEC-MAP column 3
| (Test File paths)
| Failing specs (RED)
| |
| 6. Build: RED → GREEN
| (plans reference spec,
| reuse generators +
| helpers from templates)
| |
| 7. JTBD validation bookend
| All specs GREEN
| Update SPEC-MAP column 4
| (Test Status = GREEN)
| |
| 8. commissioning-update <----|
| posted to #meta |
| (level, entities, evidence) |
| |
| 9. Commission via browser |
| Read SPEC-MAP — verify |
| zero empty cells |
| (dream team validates L4) |
| Update SPEC-MAP columns 5-6 |
| (L-Level, Last Verified) |
| |
| 10. Update feature-matrix |
| (advance L-level) |
| |
| --- REVERSE SIGNAL --- |
| |
| 11. Engineering changes behavior
| Update SPEC-MAP WHEN/THEN
| Post spec-delta to #meta
| |
| 12. Dream updates spec/index.md <--|
| Re-commissions affected rows |
Context
- PRD Template — Reference implementation of FAVV v2.1 spec
- Feature Matrix — Commissioning status for all platform features
- Commissioning Protocol — L0-L4 maturity model
- PRD Creation — How to spec capabilities, including Return Signals
- Priorities — Active PRD table (build order)
- Credibility — Three loops: inner (engineering), story (predictions), market (external validation). Market is the greatest force
- Development Pipeline — The full PAIN → COMMISSION chain that the SPEC-MAP traces
Questions
When a Story Contract is missing, does engineering build the wrong thing or build the right thing without safety tests?
- What's the cost of a Forbidden Outcome the story writer never imagined vs one they wrote but engineering skipped?
- If the parser can generate tests from FAVV rows automatically, what role does human judgment play in test design?
- When should engineering push back on a PRD vs start building and flag gaps as they find them?
- If a SPEC-MAP has zero empty cells but the market loop (Loop 3) returns no signal, is the problem the spec or the product?
- When engineering posts a
spec-deltathat changes a Story Contract's WHEN/THEN, who decides whether the new behavior is better — Dream or Engineering?