Agentic Execution

What if every plan task started with exactly the right capabilities and nothing else?

Problem

A problem well-stated is a problem 80% solved

Situation: A plan has 7 tasks. Each task activates every skill in the registry. 50+ skills load into context, most irrelevant. By task 3, context is 60% full of capability descriptions that won't fire. Agent performance degrades. By task 5, the agent starts dropping earlier decisions. A failing test from task 2 was never saved — it's gone after /clear. The issue log is a markdown file that can't be queried or trended. Decisions have no reasoning attached. The next plan starts from zero.

Intention: Plans declare exactly what they need. Each task is self-contained — full enough to survive /clear, scoped enough to stay performant. Issues and decisions persist to a queryable store. Context has a budget and a save protocol. The right agent activates with the right system prompt and the right tools.

Obstacle: Three compounding failures. (1) No plan manifest — there's nowhere to declare requiredSkills or requiredMcpTools, so everything loads. (2) No save-state protocol — /clear is dangerous because nothing codifies what must be persisted first. (3) No structured log — issues live in issues-log.md, decisions live in commit messages, both are unqueryable.

Priorities

What capabilities does this plan actually need — and what should never load?
What must be saved before /clear so the next task picks up where this one stopped?
When context reaches 70%, what's the save protocol?
How do issues and decisions get from agent brain to queryable store?
Which tasks should explicitly invoke a skill or MCP tool — and how does the prompt make that happen?

Features

Phase 1: Manifest + Persistence (2 sessions)

The foundation: plan.json declares what it needs. log-issue and log-decision write to DB.

EXEC-001: Plan capability manifest (requiredSkills[], requiredMcpTools[], assignedAgents[] in plan.json)
EXEC-004: Issue log CLI (plan-cli log-issue --severity --task --description → structured DB record)
EXEC-005: Decision log CLI (plan-cli log-decision --title --reasoning --alternatives → DB record)
EXEC-003: Task save-state schema (standard WM snapshot: failing tests, tech debt, open decisions, next steps)

Phase 2: Context Control (2 sessions)

Budget monitoring and capability loading — only what the manifest declares.

EXEC-002: Scoped capability loader (plan activation reads manifest, loads only declared skills/tools)
EXEC-006: Context budget gauge (monitor token %, warn at 70%, trigger save-state at 85%)
EXEC-007: Task boundary protocol (save → clear → load → execute → save, codified as standard plan steps 0+99)

Phase 3: Directive Wiring (1 session)

Make bestPatternPrompt explicit — skill invocations, MCP tool calls, NX generators by name.

EXEC-008: Skill invocation directives (bestPatternPrompt references /skill-name or mcp-tool-name explicitly, not by description)

Feature IDs

ID	Feature	State	Phase
EXEC-001	Plan capability manifest	L0	1
EXEC-002	Scoped capability loader	L0	2
EXEC-003	Task save-state schema	L0	1
EXEC-004	Issue log CLI	L0	1
EXEC-005	Decision log CLI	L0	1
EXEC-006	Context budget gauge	L0	2
EXEC-007	Task boundary protocol	L0	2
EXEC-008	Skill invocation directives	L0	3

0/8 features at L2+. 8 at L0.

Data Model

Project
  └── Plan (plan.json)
        ├── manifest: { requiredSkills[], requiredMcpTools[], assignedAgents[] }
        ├── phases[]
        │     └── tasks[]
        │           ├── assignedAgent (system prompt reference)
        │           ├── bestPatternPrompt (includes /skill-name or mcp-tool-name)
        │           ├── tools[]
        │           └── skills[]
        └── persistence
              ├── issues → DB (structured: severity, owner, task, description)
              ├── decisions → DB (structured: title, reasoning, alternatives)
              └── snapshots → WM markdown (failing tests, tech debt, open loops)

The hierarchy is enforced by schema. requiredSkills[] in plan.json = the only skills loaded. bestPatternPrompt explicitly names the skill or tool to invoke. Save-state happens at every task boundary — /clear is always safe.

Save-State Protocol

What must persist before any /clear:

Category	Where	Format
Failing tests	WM snapshot markdown	File path + test name + error
Tech debt items	WM snapshot markdown	File, what, why deferred
Open issues	DB via log-issue CLI	Structured: severity, task, description
Key decisions	DB via log-decision CLI	Title, reasoning, alternatives
Next task intent	WM snapshot markdown	One sentence: what the next task should do first

Issues

None yet — new PRD.

Progress

Engineering Spec — Build contract and story tests

Scorecard

Priority Score: 800 (Pain 5 x Demand 4 x Edge 4 x Trend 5 x Conversion 2)

#	Priority (should we?)	Preparedness (can we?)
1	Pain: 5 — context bloat, /clear destroys work, issues lost in markdown	Principles: 3 — clear model, no implementation yet
2	Demand: 4 — every plan every agent, but agents survive (poorly) without it	Performance: 1 — 0/8 features built
3	Edge: 4 — scoped capability loading per plan is not common in AI tooling	Platform: 3 — plan-cli exists, DB tables exist, manifest schema is new
4	Trend: 5 — context window management is #1 constraint in agentic systems	Protocols: 2 — no task boundary protocol documented yet
5	Conversion: 2 — internal infrastructure, multiplies all plan quality, no billing	Players: 3 — data-architect + lead-developer can build this

Metric	Target	Now
Skills loaded per plan	= declared	All (~50)
/clear safety	Always	Never (no save protocol)
Issues queryable	Yes	No (markdown file)
Decisions with reasoning	Yes	No

Blocked by: Agent Platform Phase 1 (plan-cli must exist, PLAT-005 DB tables needed). Kill date: none — multiplier for all other PRDs.

Kill signals:

If scoped loading reduces agent confusion by <10%, overhead isn't worth it
If agents don't log issues after 2 sprints with the CLI, the habit won't form
If save-state adds >5 minutes to each task cycle, simplify the schema

Blocks: All plan execution quality. Every team benefits from this.

Context

Agent Platform — Phase 3 (AGNT-007, PLAT-005): template improvement and DB-native templates; this PRD adds the execution runtime on top
Flow Engineering — Enforcement hierarchy: this PRD adds capability scoping to the enforcement model
DB-Native Plan Templates — PLAT-005: the schema this PRD's manifest reads from
Commissioning Dashboard — 8 EXEC features tracked at L0
Planning CLI — The CLI this PRD extends with log-issue, log-decision, load-snapshot commands
VVFL — Plans station + Work station: this PRD governs the boundary between them

Questions

If scoped loading prevents a skill from loading that an agent actually needed, is that a manifest error or a skill design error?

When context reaches 85%, the agent must choose: complete the task or save and yield. What determines the right threshold — task complexity or token cost?
Which comes first: the save-state protocol or the context budget gauge? (One without the other is incomplete.)
If every task already has a bestPatternPrompt, what's the cost of NOT making the skill reference explicit?
When two tasks in the same plan need different skills, does the manifest load both upfront, or load-on-demand between tasks?

Problem​

Priorities​

Features​

Phase 1: Manifest + Persistence (2 sessions)​

Phase 2: Context Control (2 sessions)​

Phase 3: Directive Wiring (1 session)​

Feature IDs​

Data Model​

Save-State Protocol​

Issues​

Progress​

Scorecard​

Context​

Questions​