Skip to main content

← Agentic Execution

Agentic Execution Spec

What does "the right capabilities, safely cleared, fully logged" look like as a test?

Data Model

Plan
requiredSkills: string[] # Claude Code skill names — only these load
requiredMcpTools: string[] # MCP server names — only these activate
assignedAgents: AgentRef[] # team roster for this plan
phases[]
assignedAgent: string # agent for all tasks in this phase
tasks[]
assignedAgent: string # agent override for this task
bestPatternPrompt: string # must reference /skill-name or mcp-tool or nx-generator explicitly
tools: string[]
skills: string[]

Issue (DB)
id, planId, taskId, severity (low|medium|high|critical),
ownerTeam, description, createdAt, resolvedAt?, resolution?

Decision (DB)
id, planId, taskId, title, reasoning, alternatives: string[], createdAt

Snapshot (WM markdown)
plan, task, timestamp
failingTests: { file, testName, error }[]
techDebt: { file, what, whyDeferred }[]
openDecisions: string[]
nextTaskIntent: string

Story Contract

Stories are test contracts. RED before implementation. GREEN = value delivered.

#WHEN (Trigger + Precondition)THEN (Exact Assertion)ARTIFACTTest TypeFORBIDDENOUTCOME
S1plan.json has requiredSkills: ["content-flow", "truth-seeker"] and plan activation runsCapability loader resolves exactly 2 skill definitions — no other skills present in loaded contextexec/__tests__/story-s1-scoped-load.spec.tsunitActivation loads all skills in registry when requiredSkills is setOnly declared skills consume context
S2plan-cli log-issue --severity=high --task=t02 --owner=intel --description="typecheck fails on schema" runsDB has 1 row in planning_issues where severity='high' AND taskId='t02' AND description matchesexec/__tests__/story-s2-log-issue.spec.tsintegrationIssue is written to markdown file instead of DB, or accepted with severity: nullIssues are queryable by task and severity
S3plan-cli log-decision --title="NOT NULL for bestPatternPrompt" --reasoning="nullable allowed missing prompts for weeks" --alternatives="optional field,documentation only" runsDB has 1 row in planning_decisions where title matches AND alternatives is a non-empty array with 2 entriesexec/__tests__/story-s3-log-decision.spec.tsintegrationDecision accepted without reasoning or with empty alternatives: []Decisions have structured reasoning — not just a title
S4Task save-state snapshot is written to WM markdown, then /clear runs, then plan-cli load-snapshot runsLoaded snapshot object has failingTests[], techDebt[], openDecisions[], nextTaskIntent — all fields present and matching what was savedexec/__tests__/story-s4-snapshot.spec.tsunitSnapshot missing any of the 4 fields is accepted as valid, or load-snapshot silently ignores missing fields/clear is always safe — full context recoverable
S5Context token usage reaches 70% of budgetcontextBudgetGauge() returns { level: "warning", percent: number, recommendation: "save-state" } AND at 85% returns { level: "critical", recommendation: "save-and-yield" }exec/__tests__/story-s5-budget.spec.tsunitBudget gauge returns { level: "ok" } when percent >= 70, or only returns one level not both thresholdsAgents know when to save before context is gone

Build Contract

#IDFunctionArtifactSuccess TestSafety TestValueState
1EXEC-001Plan capability manifest schemaplan.json schema extensionplan.json with requiredSkills: ["x"] passes Zod validation; plan without it also validates (backward compat)Plan with requiredSkills: null is accepted as validManifest is the single source of required capabilitiesGap
2EXEC-004Issue log CLI commandplan-cli.ts log-issueS2: log-issue with all fields → DB row with correct severity and taskIdMissing --severity silently inserts severity: nullIssues queryable by task, severity, teamGap
3EXEC-005Decision log CLI commandplan-cli.ts log-decisionS3: log-decision with reasoning + 2 alternatives → DB row with array field populatedDecision accepted without --reasoning flagDecisions have structured reasoningGap
4EXEC-003Task save-state schematypes/snapshot.ts + plan-cli.ts save-snapshotS4: snapshot written with all 4 fields → load-snapshot returns typed object with all fields non-nullSnapshot accepted with any fields missing/clear is always safeGap
5EXEC-002Scoped capability loaderplan-activation.tsS1: plan with requiredSkills: ["content-flow"] → activation resolves exactly 1 skill, not all skillsLoader silently falls back to full skill set when manifest is setContext budget protected per planGap
6EXEC-006Context budget gaugecontext-budget.tsS5: gauge at 70% → level: "warning", at 85% → level: "critical", with correct recommendation fieldGauge returns level: "ok" when percent ≥ 70Agents know when to yieldGap
7EXEC-007Task boundary protocol in templateCLAUDE.md task boundary section + plan template updateEvery plan template has step 0 (load snapshot if exists) and step 99 (save snapshot + log unresolved issues)Template accepted without bookend steps/clear between tasks is standard, not exceptionalGap
8EXEC-008Explicit skill invocation directivesbestPatternPrompt convention + validatorplan-cli validate-prompts returns 0 errors on a plan where every task's bestPatternPrompt contains at least one /skill-name or mcp:tool-name referencePrompt with no skill reference passes validationAgents invoke skills, not rediscover themGap

Context

Task Boundary Protocol (Standard Steps)

Every plan template must include these as the first and last steps in every task:

Step 0 (always first):

LOAD CONTEXT: Check for snapshot from previous task (plan-cli load-snapshot --task=<id>).
If snapshot exists: read failingTests, techDebt, openDecisions, nextTaskIntent before starting.

Step 99 (always last):

SAVE STATE: Before finishing this task:
1. plan-cli log-issue for any unresolved issues (non-zero severity)
2. plan-cli log-decision for any architectural choices made
3. plan-cli save-snapshot --task=<id> with failingTests, techDebt, openDecisions, nextTaskIntent
/clear is now safe.

Skill Invocation Directive Convention

bestPatternPrompt must reference the mechanism explicitly — not by description:

Reference TypeFormatExample
Claude Code skill/skill-nameUse /content-flow to review this page
MCP toolmcp:server-name:tool-nameUse mcp:context7:query-docs to fetch docs
NX generatornx generate @scope/plugin:genRun nx generate @stackmates/schema:table
CLI commandnpx tsx tools/scripts/...Run npx tsx tools/scripts/planning/plan-cli.ts list-templates

A bestPatternPrompt with no explicit reference is incomplete. The validator (plan-cli validate-prompts) enforces this.

Context

Questions

When a plan has 8 tasks and 3 of them need a skill the other 5 don't, does the manifest load it for all 8 — or does each task declare its own override?

  • If validate-prompts fails on 20 of 33 templates, is that a sprint blocker or a background cleanup task?
  • What happens to issues logged mid-task that turn out to be non-issues by task end — does the log need an auto-resolved state?
  • When the context budget gauge reaches critical, who decides whether to yield: the agent or the human?