Agentic Execution Spec
What does "the right capabilities, safely cleared, fully logged" look like as a test?
Data Model
Plan
requiredSkills: string[] # Claude Code skill names — only these load
requiredMcpTools: string[] # MCP server names — only these activate
assignedAgents: AgentRef[] # team roster for this plan
phases[]
assignedAgent: string # agent for all tasks in this phase
tasks[]
assignedAgent: string # agent override for this task
bestPatternPrompt: string # must reference /skill-name or mcp-tool or nx-generator explicitly
tools: string[]
skills: string[]
Issue (DB)
id, planId, taskId, severity (low|medium|high|critical),
ownerTeam, description, createdAt, resolvedAt?, resolution?
Decision (DB)
id, planId, taskId, title, reasoning, alternatives: string[], createdAt
Snapshot (WM markdown)
plan, task, timestamp
failingTests: { file, testName, error }[]
techDebt: { file, what, whyDeferred }[]
openDecisions: string[]
nextTaskIntent: string
Story Contract
Stories are test contracts. RED before implementation. GREEN = value delivered.
| # | WHEN (Trigger + Precondition) | THEN (Exact Assertion) | ARTIFACT | Test Type | FORBIDDEN | OUTCOME |
|---|---|---|---|---|---|---|
| S1 | plan.json has requiredSkills: ["content-flow", "truth-seeker"] and plan activation runs | Capability loader resolves exactly 2 skill definitions — no other skills present in loaded context | exec/__tests__/story-s1-scoped-load.spec.ts | unit | Activation loads all skills in registry when requiredSkills is set | Only declared skills consume context |
| S2 | plan-cli log-issue --severity=high --task=t02 --owner=intel --description="typecheck fails on schema" runs | DB has 1 row in planning_issues where severity='high' AND taskId='t02' AND description matches | exec/__tests__/story-s2-log-issue.spec.ts | integration | Issue is written to markdown file instead of DB, or accepted with severity: null | Issues are queryable by task and severity |
| S3 | plan-cli log-decision --title="NOT NULL for bestPatternPrompt" --reasoning="nullable allowed missing prompts for weeks" --alternatives="optional field,documentation only" runs | DB has 1 row in planning_decisions where title matches AND alternatives is a non-empty array with 2 entries | exec/__tests__/story-s3-log-decision.spec.ts | integration | Decision accepted without reasoning or with empty alternatives: [] | Decisions have structured reasoning — not just a title |
| S4 | Task save-state snapshot is written to WM markdown, then /clear runs, then plan-cli load-snapshot runs | Loaded snapshot object has failingTests[], techDebt[], openDecisions[], nextTaskIntent — all fields present and matching what was saved | exec/__tests__/story-s4-snapshot.spec.ts | unit | Snapshot missing any of the 4 fields is accepted as valid, or load-snapshot silently ignores missing fields | /clear is always safe — full context recoverable |
| S5 | Context token usage reaches 70% of budget | contextBudgetGauge() returns { level: "warning", percent: number, recommendation: "save-state" } AND at 85% returns { level: "critical", recommendation: "save-and-yield" } | exec/__tests__/story-s5-budget.spec.ts | unit | Budget gauge returns { level: "ok" } when percent >= 70, or only returns one level not both thresholds | Agents know when to save before context is gone |
Build Contract
| # | ID | Function | Artifact | Success Test | Safety Test | Value | State |
|---|---|---|---|---|---|---|---|
| 1 | EXEC-001 | Plan capability manifest schema | plan.json schema extension | plan.json with requiredSkills: ["x"] passes Zod validation; plan without it also validates (backward compat) | Plan with requiredSkills: null is accepted as valid | Manifest is the single source of required capabilities | Gap |
| 2 | EXEC-004 | Issue log CLI command | plan-cli.ts log-issue | S2: log-issue with all fields → DB row with correct severity and taskId | Missing --severity silently inserts severity: null | Issues queryable by task, severity, team | Gap |
| 3 | EXEC-005 | Decision log CLI command | plan-cli.ts log-decision | S3: log-decision with reasoning + 2 alternatives → DB row with array field populated | Decision accepted without --reasoning flag | Decisions have structured reasoning | Gap |
| 4 | EXEC-003 | Task save-state schema | types/snapshot.ts + plan-cli.ts save-snapshot | S4: snapshot written with all 4 fields → load-snapshot returns typed object with all fields non-null | Snapshot accepted with any fields missing | /clear is always safe | Gap |
| 5 | EXEC-002 | Scoped capability loader | plan-activation.ts | S1: plan with requiredSkills: ["content-flow"] → activation resolves exactly 1 skill, not all skills | Loader silently falls back to full skill set when manifest is set | Context budget protected per plan | Gap |
| 6 | EXEC-006 | Context budget gauge | context-budget.ts | S5: gauge at 70% → level: "warning", at 85% → level: "critical", with correct recommendation field | Gauge returns level: "ok" when percent ≥ 70 | Agents know when to yield | Gap |
| 7 | EXEC-007 | Task boundary protocol in template | CLAUDE.md task boundary section + plan template update | Every plan template has step 0 (load snapshot if exists) and step 99 (save snapshot + log unresolved issues) | Template accepted without bookend steps | /clear between tasks is standard, not exceptional | Gap |
| 8 | EXEC-008 | Explicit skill invocation directives | bestPatternPrompt convention + validator | plan-cli validate-prompts returns 0 errors on a plan where every task's bestPatternPrompt contains at least one /skill-name or mcp:tool-name reference | Prompt with no skill reference passes validation | Agents invoke skills, not rediscover them | Gap |
Context Budget Thresholds
| Level | Percent | Recommendation | Action Required |
|---|---|---|---|
| OK | <70% | Continue | None |
| Warning | 70–84% | Save state soon | Log unresolved issues/decisions to DB |
| Critical | 85–94% | Save and yield | Write snapshot, signal task incomplete, /clear ready |
| Overflow | ≥95% | Emergency save | Truncate to snapshot only — core context preserved |
Task Boundary Protocol (Standard Steps)
Every plan template must include these as the first and last steps in every task:
Step 0 (always first):
LOAD CONTEXT: Check for snapshot from previous task (plan-cli load-snapshot --task=<id>).
If snapshot exists: read failingTests, techDebt, openDecisions, nextTaskIntent before starting.
Step 99 (always last):
SAVE STATE: Before finishing this task:
1. plan-cli log-issue for any unresolved issues (non-zero severity)
2. plan-cli log-decision for any architectural choices made
3. plan-cli save-snapshot --task=<id> with failingTests, techDebt, openDecisions, nextTaskIntent
/clear is now safe.
Skill Invocation Directive Convention
bestPatternPrompt must reference the mechanism explicitly — not by description:
| Reference Type | Format | Example |
|---|---|---|
| Claude Code skill | /skill-name | Use /content-flow to review this page |
| MCP tool | mcp:server-name:tool-name | Use mcp:context7:query-docs to fetch docs |
| NX generator | nx generate @scope/plugin:gen | Run nx generate @stackmates/schema:table |
| CLI command | npx tsx tools/scripts/... | Run npx tsx tools/scripts/planning/plan-cli.ts list-templates |
A bestPatternPrompt with no explicit reference is incomplete. The validator (plan-cli validate-prompts) enforces this.
Context
- Agentic Execution PRD — Parent PRD with phase map and scoring
- Agent Platform Phase 3 — AGNT-007 and PLAT-005 are prerequisites
- Flow Engineering — The enforcement hierarchy this PRD extends
- DB-Native Plan Templates — Schema that EXEC-001 builds on top of
Questions
When a plan has 8 tasks and 3 of them need a skill the other 5 don't, does the manifest load it for all 8 — or does each task declare its own override?
- If
validate-promptsfails on 20 of 33 templates, is that a sprint blocker or a background cleanup task? - What happens to issues logged mid-task that turn out to be non-issues by task end — does the log need an
auto-resolvedstate? - When the context budget gauge reaches critical, who decides whether to yield: the agent or the human?