Skip to main content

Agentic Execution

What if every plan task started with exactly the right capabilities and nothing else?

Problem

A problem well-stated is a problem 80% solved

Situation: A plan has 7 tasks. Each task activates every skill in the registry. 50+ skills load into context, most irrelevant. By task 3, context is 60% full of capability descriptions that won't fire. Agent performance degrades. By task 5, the agent starts dropping earlier decisions. A failing test from task 2 was never saved — it's gone after /clear. The issue log is a markdown file that can't be queried or trended. Decisions have no reasoning attached. The next plan starts from zero.

Intention: Plans declare exactly what they need. Each task is self-contained — full enough to survive /clear, scoped enough to stay performant. Issues and decisions persist to a queryable store. Context has a budget and a save protocol. The right agent activates with the right system prompt and the right tools.

Obstacle: Three compounding failures. (1) No plan manifest — there's nowhere to declare requiredSkills or requiredMcpTools, so everything loads. (2) No save-state protocol — /clear is dangerous because nothing codifies what must be persisted first. (3) No structured log — issues live in issues-log.md, decisions live in commit messages, both are unqueryable.

Priorities

  1. What capabilities does this plan actually need — and what should never load?
  2. What must be saved before /clear so the next task picks up where this one stopped?
  3. When context reaches 70%, what's the save protocol?
  4. How do issues and decisions get from agent brain to queryable store?
  5. Which tasks should explicitly invoke a skill or MCP tool — and how does the prompt make that happen?

Features

Phase 1: Manifest + Persistence (2 sessions)

The foundation: plan.json declares what it needs. log-issue and log-decision write to DB.

  • EXEC-001: Plan capability manifest (requiredSkills[], requiredMcpTools[], assignedAgents[] in plan.json)
  • EXEC-004: Issue log CLI (plan-cli log-issue --severity --task --description → structured DB record)
  • EXEC-005: Decision log CLI (plan-cli log-decision --title --reasoning --alternatives → DB record)
  • EXEC-003: Task save-state schema (standard WM snapshot: failing tests, tech debt, open decisions, next steps)

Phase 2: Context Control (2 sessions)

Budget monitoring and capability loading — only what the manifest declares.

  • EXEC-002: Scoped capability loader (plan activation reads manifest, loads only declared skills/tools)
  • EXEC-006: Context budget gauge (monitor token %, warn at 70%, trigger save-state at 85%)
  • EXEC-007: Task boundary protocol (save → clear → load → execute → save, codified as standard plan steps 0+99)

Phase 3: Directive Wiring (1 session)

Make bestPatternPrompt explicit — skill invocations, MCP tool calls, NX generators by name.

  • EXEC-008: Skill invocation directives (bestPatternPrompt references /skill-name or mcp-tool-name explicitly, not by description)

Feature IDs

IDFeatureStatePhase
EXEC-001Plan capability manifestL01
EXEC-002Scoped capability loaderL02
EXEC-003Task save-state schemaL01
EXEC-004Issue log CLIL01
EXEC-005Decision log CLIL01
EXEC-006Context budget gaugeL02
EXEC-007Task boundary protocolL02
EXEC-008Skill invocation directivesL03

0/8 features at L2+. 8 at L0.

Data Model

Project
└── Plan (plan.json)
├── manifest: { requiredSkills[], requiredMcpTools[], assignedAgents[] }
├── phases[]
│ └── tasks[]
│ ├── assignedAgent (system prompt reference)
│ ├── bestPatternPrompt (includes /skill-name or mcp-tool-name)
│ ├── tools[]
│ └── skills[]
└── persistence
├── issues → DB (structured: severity, owner, task, description)
├── decisions → DB (structured: title, reasoning, alternatives)
└── snapshots → WM markdown (failing tests, tech debt, open loops)

The hierarchy is enforced by schema. requiredSkills[] in plan.json = the only skills loaded. bestPatternPrompt explicitly names the skill or tool to invoke. Save-state happens at every task boundary — /clear is always safe.

Save-State Protocol

What must persist before any /clear:

CategoryWhereFormat
Failing testsWM snapshot markdownFile path + test name + error
Tech debt itemsWM snapshot markdownFile, what, why deferred
Open issuesDB via log-issue CLIStructured: severity, task, description
Key decisionsDB via log-decision CLITitle, reasoning, alternatives
Next task intentWM snapshot markdownOne sentence: what the next task should do first

Issues

None yet — new PRD.

Progress

Scorecard

Priority Score: 800 (Pain 5 x Demand 4 x Edge 4 x Trend 5 x Conversion 2)

#Priority (should we?)Preparedness (can we?)
1Pain: 5 — context bloat, /clear destroys work, issues lost in markdownPrinciples: 3 — clear model, no implementation yet
2Demand: 4 — every plan every agent, but agents survive (poorly) without itPerformance: 1 — 0/8 features built
3Edge: 4 — scoped capability loading per plan is not common in AI toolingPlatform: 3 — plan-cli exists, DB tables exist, manifest schema is new
4Trend: 5 — context window management is #1 constraint in agentic systemsProtocols: 2 — no task boundary protocol documented yet
5Conversion: 2 — internal infrastructure, multiplies all plan quality, no billingPlayers: 3 — data-architect + lead-developer can build this
MetricTargetNow
Skills loaded per plan= declaredAll (~50)
/clear safetyAlwaysNever (no save protocol)
Issues queryableYesNo (markdown file)
Decisions with reasoningYesNo

Blocked by: Agent Platform Phase 1 (plan-cli must exist, PLAT-005 DB tables needed). Kill date: none — multiplier for all other PRDs.

Kill signals:

  • If scoped loading reduces agent confusion by <10%, overhead isn't worth it
  • If agents don't log issues after 2 sprints with the CLI, the habit won't form
  • If save-state adds >5 minutes to each task cycle, simplify the schema

Blocks: All plan execution quality. Every team benefits from this.

Context

  • Agent Platform — Phase 3 (AGNT-007, PLAT-005): template improvement and DB-native templates; this PRD adds the execution runtime on top
  • Flow Engineering — Enforcement hierarchy: this PRD adds capability scoping to the enforcement model
  • DB-Native Plan Templates — PLAT-005: the schema this PRD's manifest reads from
  • Commissioning Dashboard — 8 EXEC features tracked at L0
  • Planning CLI — The CLI this PRD extends with log-issue, log-decision, load-snapshot commands
  • VVFL — Plans station + Work station: this PRD governs the boundary between them

Questions

If scoped loading prevents a skill from loading that an agent actually needed, is that a manifest error or a skill design error?

  • When context reaches 85%, the agent must choose: complete the task or save and yield. What determines the right threshold — task complexity or token cost?
  • Which comes first: the save-state protocol or the context budget gauge? (One without the other is incomplete.)
  • If every task already has a bestPatternPrompt, what's the cost of NOT making the skill reference explicit?
  • When two tasks in the same plan need different skills, does the manifest load both upfront, or load-on-demand between tasks?