Skip to main content

Agentic Execution Spec

What does "the right capabilities, safely cleared, fully logged" look like as a test?

Data Model

Plan
requiredSkills: string[] # Claude Code skill names — only these load
requiredMcpTools: string[] # MCP server names — only these activate
assignedAgents: AgentRef[] # team roster for this plan
phases[]
assignedAgent: string # agent for all tasks in this phase
tasks[]
assignedAgent: string # agent override for this task
bestPatternPrompt: string # must reference /skill-name or mcp-tool or nx-generator explicitly
tools: string[]
skills: string[]

Issue (DB)
id, planId, taskId, severity (low|medium|high|critical),
ownerTeam, description, createdAt, resolvedAt?, resolution?

Decision (DB)
id, planId, taskId, title, reasoning, alternatives: string[], createdAt

Snapshot (WM markdown)
plan, task, timestamp
failingTests: { file, testName, error }[]
techDebt: { file, what, whyDeferred }[]
openDecisions: string[]
nextTaskIntent: string

Story Contract

Stories are test contracts. RED before implementation. GREEN = value delivered.

#WHEN (Trigger + Precondition)THEN (Exact Assertion)ARTIFACTTest TypeFORBIDDENOUTCOME
S1plan.json has requiredSkills: ["content-flow", "truth-seeker"] and plan activation runsCapability loader resolves exactly 2 skill definitions — no other skills present in loaded contextexec/__tests__/story-s1-scoped-load.spec.tsunitActivation loads all skills in registry when requiredSkills is setOnly declared skills consume context
S2plan-cli log-issue --severity=high --task=t02 --owner=intel --description="typecheck fails on schema" runsDB has 1 row in planning_issues where severity='high' AND taskId='t02' AND description matchesexec/__tests__/story-s2-log-issue.spec.tsintegrationIssue is written to markdown file instead of DB, or accepted with severity: nullIssues are queryable by task and severity
S3plan-cli log-decision --title="NOT NULL for bestPatternPrompt" --reasoning="nullable allowed missing prompts for weeks" --alternatives="optional field,documentation only" runsDB has 1 row in planning_decisions where title matches AND alternatives is a non-empty array with 2 entriesexec/__tests__/story-s3-log-decision.spec.tsintegrationDecision accepted without reasoning or with empty alternatives: []Decisions have structured reasoning — not just a title
S4Task save-state snapshot is written to WM markdown, then /clear runs, then plan-cli load-snapshot runsLoaded snapshot object has failingTests[], techDebt[], openDecisions[], nextTaskIntent — all fields present and matching what was savedexec/__tests__/story-s4-snapshot.spec.tsunitSnapshot missing any of the 4 fields is accepted as valid, or load-snapshot silently ignores missing fields/clear is always safe — full context recoverable
S5Context token usage reaches 70% of budgetcontextBudgetGauge() returns { level: "warning", percent: number, recommendation: "save-state" } AND at 85% returns { level: "critical", recommendation: "save-and-yield" }exec/__tests__/story-s5-budget.spec.tsunitBudget gauge returns { level: "ok" } when percent >= 70, or only returns one level not both thresholdsAgents know when to save before context is gone

Build Contract

#IDFunctionArtifactSuccess TestSafety TestValueState
1EXEC-001Plan capability manifest schemaplan.json schema extensionplan.json with requiredSkills: ["x"] passes Zod validation; plan without it also validates (backward compat)Plan with requiredSkills: null is accepted as validManifest is the single source of required capabilitiesGap
2EXEC-004Issue log CLI commandplan-cli.ts log-issueS2: log-issue with all fields → DB row with correct severity and taskIdMissing --severity silently inserts severity: nullIssues queryable by task, severity, teamGap
3EXEC-005Decision log CLI commandplan-cli.ts log-decisionS3: log-decision with reasoning + 2 alternatives → DB row with array field populatedDecision accepted without --reasoning flagDecisions have structured reasoningGap
4EXEC-003Task save-state schematypes/snapshot.ts + plan-cli.ts save-snapshotS4: snapshot written with all 4 fields → load-snapshot returns typed object with all fields non-nullSnapshot accepted with any fields missing/clear is always safeGap
5EXEC-002Scoped capability loaderplan-activation.tsS1: plan with requiredSkills: ["content-flow"] → activation resolves exactly 1 skill, not all skillsLoader silently falls back to full skill set when manifest is setContext budget protected per planGap
6EXEC-006Context budget gaugecontext-budget.tsS5: gauge at 70% → level: "warning", at 85% → level: "critical", with correct recommendation fieldGauge returns level: "ok" when percent ≥ 70Agents know when to yieldGap
7EXEC-007Task boundary protocol in templateCLAUDE.md task boundary section + plan template updateEvery plan template has step 0 (load snapshot if exists) and step 99 (save snapshot + log unresolved issues)Template accepted without bookend steps/clear between tasks is standard, not exceptionalGap
8EXEC-008Explicit skill invocation directivesbestPatternPrompt convention + validatorplan-cli validate-prompts returns 0 errors on a plan where every task's bestPatternPrompt contains at least one /skill-name or mcp:tool-name referencePrompt with no skill reference passes validationAgents invoke skills, not rediscover themGap

Context Budget Thresholds

LevelPercentRecommendationAction Required
OK<70%ContinueNone
Warning70–84%Save state soonLog unresolved issues/decisions to DB
Critical85–94%Save and yieldWrite snapshot, signal task incomplete, /clear ready
Overflow≥95%Emergency saveTruncate to snapshot only — core context preserved

Task Boundary Protocol (Standard Steps)

Every plan template must include these as the first and last steps in every task:

Step 0 (always first):

LOAD CONTEXT: Check for snapshot from previous task (plan-cli load-snapshot --task=<id>).
If snapshot exists: read failingTests, techDebt, openDecisions, nextTaskIntent before starting.

Step 99 (always last):

SAVE STATE: Before finishing this task:
1. plan-cli log-issue for any unresolved issues (non-zero severity)
2. plan-cli log-decision for any architectural choices made
3. plan-cli save-snapshot --task=<id> with failingTests, techDebt, openDecisions, nextTaskIntent
/clear is now safe.

Skill Invocation Directive Convention

bestPatternPrompt must reference the mechanism explicitly — not by description:

Reference TypeFormatExample
Claude Code skill/skill-nameUse /content-flow to review this page
MCP toolmcp:server-name:tool-nameUse mcp:context7:query-docs to fetch docs
NX generatornx generate @scope/plugin:genRun nx generate @stackmates/schema:table
CLI commandnpx tsx tools/scripts/...Run npx tsx tools/scripts/planning/plan-cli.ts list-templates

A bestPatternPrompt with no explicit reference is incomplete. The validator (plan-cli validate-prompts) enforces this.

Context

Questions

When a plan has 8 tasks and 3 of them need a skill the other 5 don't, does the manifest load it for all 8 — or does each task declare its own override?

  • If validate-prompts fails on 20 of 33 templates, is that a sprint blocker or a background cleanup task?
  • What happens to issues logged mid-task that turn out to be non-issues by task end — does the log need an auto-resolved state?
  • When the context budget gauge reaches critical, who decides whether to yield: the agent or the human?