Agentic Execution
What if every plan task started with exactly the right capabilities and nothing else?
Problem
A problem well-stated is a problem 80% solved
Situation: A plan has 7 tasks. Each task activates every skill in the registry. 50+ skills load into context, most irrelevant. By task 3, context is 60% full of capability descriptions that won't fire. Agent performance degrades. By task 5, the agent starts dropping earlier decisions. A failing test from task 2 was never saved — it's gone after /clear. The issue log is a markdown file that can't be queried or trended. Decisions have no reasoning attached. The next plan starts from zero.
Intention: Plans declare exactly what they need. Each task is self-contained — full enough to survive /clear, scoped enough to stay performant. Issues and decisions persist to a queryable store. Context has a budget and a save protocol. The right agent activates with the right system prompt and the right tools.
Obstacle: Three compounding failures. (1) No plan manifest — there's nowhere to declare requiredSkills or requiredMcpTools, so everything loads. (2) No save-state protocol — /clear is dangerous because nothing codifies what must be persisted first. (3) No structured log — issues live in issues-log.md, decisions live in commit messages, both are unqueryable.
Priorities
- What capabilities does this plan actually need — and what should never load?
- What must be saved before
/clearso the next task picks up where this one stopped? - When context reaches 70%, what's the save protocol?
- How do issues and decisions get from agent brain to queryable store?
- Which tasks should explicitly invoke a skill or MCP tool — and how does the prompt make that happen?
Features
Phase 1: Manifest + Persistence (2 sessions)
The foundation: plan.json declares what it needs. log-issue and log-decision write to DB.
- EXEC-001: Plan capability manifest (
requiredSkills[],requiredMcpTools[],assignedAgents[]in plan.json) - EXEC-004: Issue log CLI (
plan-cli log-issue --severity --task --description→ structured DB record) - EXEC-005: Decision log CLI (
plan-cli log-decision --title --reasoning --alternatives→ DB record) - EXEC-003: Task save-state schema (standard WM snapshot: failing tests, tech debt, open decisions, next steps)
Phase 2: Context Control (2 sessions)
Budget monitoring and capability loading — only what the manifest declares.
- EXEC-002: Scoped capability loader (plan activation reads manifest, loads only declared skills/tools)
- EXEC-006: Context budget gauge (monitor token %, warn at 70%, trigger save-state at 85%)
- EXEC-007: Task boundary protocol (save → clear → load → execute → save, codified as standard plan steps 0+99)
Phase 3: Directive Wiring (1 session)
Make bestPatternPrompt explicit — skill invocations, MCP tool calls, NX generators by name.
- EXEC-008: Skill invocation directives (
bestPatternPromptreferences/skill-nameormcp-tool-nameexplicitly, not by description)
Feature IDs
| ID | Feature | State | Phase |
|---|---|---|---|
| EXEC-001 | Plan capability manifest | L0 | 1 |
| EXEC-002 | Scoped capability loader | L0 | 2 |
| EXEC-003 | Task save-state schema | L0 | 1 |
| EXEC-004 | Issue log CLI | L0 | 1 |
| EXEC-005 | Decision log CLI | L0 | 1 |
| EXEC-006 | Context budget gauge | L0 | 2 |
| EXEC-007 | Task boundary protocol | L0 | 2 |
| EXEC-008 | Skill invocation directives | L0 | 3 |
0/8 features at L2+. 8 at L0.
Data Model
Project
└── Plan (plan.json)
├── manifest: { requiredSkills[], requiredMcpTools[], assignedAgents[] }
├── phases[]
│ └── tasks[]
│ ├── assignedAgent (system prompt reference)
│ ├── bestPatternPrompt (includes /skill-name or mcp-tool-name)
│ ├── tools[]
│ └── skills[]
└── persistence
├── issues → DB (structured: severity, owner, task, description)
├── decisions → DB (structured: title, reasoning, alternatives)
└── snapshots → WM markdown (failing tests, tech debt, open loops)
The hierarchy is enforced by schema. requiredSkills[] in plan.json = the only skills loaded. bestPatternPrompt explicitly names the skill or tool to invoke. Save-state happens at every task boundary — /clear is always safe.
Save-State Protocol
What must persist before any /clear:
| Category | Where | Format |
|---|---|---|
| Failing tests | WM snapshot markdown | File path + test name + error |
| Tech debt items | WM snapshot markdown | File, what, why deferred |
| Open issues | DB via log-issue CLI | Structured: severity, task, description |
| Key decisions | DB via log-decision CLI | Title, reasoning, alternatives |
| Next task intent | WM snapshot markdown | One sentence: what the next task should do first |
Issues
None yet — new PRD.
Progress
- Engineering Spec — Build contract and story tests
Scorecard
Priority Score: 800 (Pain 5 x Demand 4 x Edge 4 x Trend 5 x Conversion 2)
| # | Priority (should we?) | Preparedness (can we?) |
|---|---|---|
| 1 | Pain: 5 — context bloat, /clear destroys work, issues lost in markdown | Principles: 3 — clear model, no implementation yet |
| 2 | Demand: 4 — every plan every agent, but agents survive (poorly) without it | Performance: 1 — 0/8 features built |
| 3 | Edge: 4 — scoped capability loading per plan is not common in AI tooling | Platform: 3 — plan-cli exists, DB tables exist, manifest schema is new |
| 4 | Trend: 5 — context window management is #1 constraint in agentic systems | Protocols: 2 — no task boundary protocol documented yet |
| 5 | Conversion: 2 — internal infrastructure, multiplies all plan quality, no billing | Players: 3 — data-architect + lead-developer can build this |
| Metric | Target | Now |
|---|---|---|
| Skills loaded per plan | = declared | All (~50) |
| /clear safety | Always | Never (no save protocol) |
| Issues queryable | Yes | No (markdown file) |
| Decisions with reasoning | Yes | No |
Blocked by: Agent Platform Phase 1 (plan-cli must exist, PLAT-005 DB tables needed). Kill date: none — multiplier for all other PRDs.
Kill signals:
- If scoped loading reduces agent confusion by <10%, overhead isn't worth it
- If agents don't log issues after 2 sprints with the CLI, the habit won't form
- If save-state adds >5 minutes to each task cycle, simplify the schema
Blocks: All plan execution quality. Every team benefits from this.
Context
- Agent Platform — Phase 3 (AGNT-007, PLAT-005): template improvement and DB-native templates; this PRD adds the execution runtime on top
- Flow Engineering — Enforcement hierarchy: this PRD adds capability scoping to the enforcement model
- DB-Native Plan Templates — PLAT-005: the schema this PRD's manifest reads from
- Commissioning Dashboard — 8 EXEC features tracked at L0
- Planning CLI — The CLI this PRD extends with log-issue, log-decision, load-snapshot commands
- VVFL — Plans station + Work station: this PRD governs the boundary between them
Questions
If scoped loading prevents a skill from loading that an agent actually needed, is that a manifest error or a skill design error?
- When context reaches 85%, the agent must choose: complete the task or save and yield. What determines the right threshold — task complexity or token cost?
- Which comes first: the save-state protocol or the context budget gauge? (One without the other is incomplete.)
- If every task already has a
bestPatternPrompt, what's the cost of NOT making the skill reference explicit? - When two tasks in the same plan need different skills, does the manifest load both upfront, or load-on-demand between tasks?