Agent Platform
What makes an agent more than a prompt with memory?
Problem
A problem well-stated is a problem 80% solved
Situation: Four PRDs described one system from four angles — session boot, project management, settlement verification, and the platform itself. Engineering asked "where do I start?" Three CLIs, 23 templates, 16 agents — no coordination. Agents spend 5-10 minutes re-exploring what happened last session. Issues live in a markdown file that can't scale. Zero capabilities at L4 because no automated settlement verification exists.
Intention: One platform where every agent inherits identity, memory, comms, dispatch, issue routing, quality enforcement, and settlement verification — so engineering builds domain knowledge, not infrastructure, and the system audits itself.
Obstacle: 4.3% flow efficiency. 130 minutes of work in 1-4 days of wait time. Dispatch is manual. Context recovery starts from zero. 8 min per issue in markdown. 188 capabilities tracked, 0 at L4. The tool must build itself.
Priorities
- What makes an agent more than a prompt with memory?
- Where does time die in the agent session lifecycle?
- How does an agent know whether to fix, file, or build?
- Does what arrived match what was promised?
- Who coordinates — and what verifies the work?
Features
Phase 0: Session Boot — DONE
Consolidated from Nav Continuity Layer. All 3 features at L2.
- NAV-001: Briefing compiler (nav-briefing.md generated at session start)
- NAV-002: Memory ledger (MEMORY.md + WM-NAV.md + daily context)
- NAV-003: Boot hook (team-activation.sh loads context via nav-session-boot.mjs)
Zero manual re-briefing over last 10+ sessions. Kill signal passed: compiler runs in <10s, injects <2000 tokens.
Phase 1: Unified CLI + VVFL MVP (3-4 sessions)
Engineering Spec. 8 dimensions: generator, template, rule, skill, agent, platform, virtue, pattern.
- Extract shared DB context from plan-cli pattern
- Build thin router (
drmgdispatches to handlers) - Seed context graph from filesystem
- Build 8-dimension auditors (one per enforcement dimension)
- Wire audit command with
--dry-runflag
Phase 2: Issue Tracking + Plans UI (2-3 sessions)
Consolidated from Agent Project Management. Engineering Spec.
- PLAT-001: Issue tracking CLI (#1-7) — replace issues-log.md with queryable store
- PROJ-001: Plans dashboard UI (#8-12) — wire plan list/detail/analytics to DB
- Decision routing logic (fix / PRD / story)
- Priority dispatch (priority table change → Convex message)
Phase 3: Quality Loop (2 sessions)
The Deming shift: from inspection (hooks catch violations) to prevention (generators make violations impossible). Measured by hook trigger rate trending toward zero.
- AGNT-003: Receipt writer (post-session: store receipt via agent-etl-cli)
- AGNT-004: Hook failure tracker (aggregate triggers by content type, pattern, agent)
- AGNT-005: Scaffold generators (content types born with correct frontmatter, heading skeleton, questions)
- AGNT-006: Skill maturity scorer (invocations x artifacts x gate pass rate)
- AGNT-007: Template improvement trigger (receipt pattern → template update → legacy principle)
- PLAT-005: DB-native plan template registry (schema enforces best_pattern_prompt NOT NULL — enables AGNT-007 to write back to DB)
Phase 4: Settlement Bridge (2-3 sessions)
Consolidated from Hemisphere Bridge. Engineering Spec. Pictures.
Two repositories. One dreams (this repo — specs, PRDs, scoreboard). One builds (stackmates — code, schemas, hooks). Every handoff is a settlement boundary where trust can break.
- PLAT-002: Settlement verification protocol (automated spec-vs-reality delta)
- PLAT-003: Trust ledger (settlement history in Supabase)
- PLAT-004: Autonomous loop control (
/loopheartbeat) - AGNT-008: Agent boundary enforcement hooks
- NAV-004: Scoreboard live readings (L2 — content graph in session boot)
Three bearings: Internal (hooks, mostly virtuous), Hemisphere (manual, vicious — this phase fixes it), External (no production users yet).
Phase 5: Learning Engine (2 sessions)
- Pattern extractor (cross-run trend detection)
- Memory writer (patterns → semantic, runs → episodic)
- Agent recall of VVFL patterns (shared semantic memory query)
- Action generator (critical findings → plan issues)
Phase 6: API + A2A (3-4 sessions)
- API route per drmg module (REST endpoints wrapping CLI handlers)
- CLI as thin client (
drmgcalls API routes instead of direct DB) - Auth + rate limiting (API keys, per-agent rate limits)
- AI-008: Agent Card definition (advertise capabilities per agent type)
- Task Card wrapper (map message types to A2A Task lifecycle)
- A2A discovery endpoint (
/.well-known/agent.json)
Feature IDs
| ID | Feature | State | Phase |
|---|---|---|---|
| NAV-001 | Briefing compiler | L2 | 0 |
| NAV-002 | Memory ledger | L2 | 0 |
| NAV-003 | Boot hook | L2 | 0 |
| NAV-004 | Scoreboard live readings | L2 | 4 |
| AGNT-001 | Standards registry | L2 | 1 |
| AGNT-002 | SBO Scoreboard | L2 | 1 |
| AGNT-003 | Receipt schema | L0 | 3 |
| AGNT-004 | Hook failure tracker | L0 | 3 |
| AGNT-005 | Scaffold generators | L0 | 3 |
| AGNT-006 | Skill maturity scorer | L0 | 3 |
| AGNT-007 | Template improvement loop | L0 | 3 |
| AGNT-008 | Agent boundary enforcement | L0 | 4 |
| AI-008 | Multi-agent orchestration | L2 | 6 |
| PLAT-001 | Issue tracking CLI | L0 | 2 |
| PLAT-002 | Settlement verification | L0 | 4 |
| PLAT-003 | Trust ledger | L0 | 4 |
| PLAT-004 | Autonomous loop control | L0 | 4 |
| PLAT-005 | DB-native template registry | L0 | 3 |
| PROJ-001 | Plans dashboard UI | L0 | 2 |
7/19 features at L2+. 12 at L0.
Issues
| # | Severity | What Happens | Fix |
|---|---|---|---|
| 26 | HIGH | /plans shows 0 plans, no create button. DB has plans via CLI but UI reads nothing. | Wire plan list query to DB. Add create plan action. |
| 27 | HIGH | /plans/analytics shows hardcoded zeros. No charts, no trends. | Query plan data for metrics. Add trend chart. |
| 28 | MEDIUM | /calendar Team Calendar shows "No Plans Scheduled" despite plans in DB. | Wire calendar to plan date ranges. |
| 29 | MEDIUM | /plans table has columns but no data, no row click navigation. | Populate from DB, add row click to detail page. |
| 30 | LOW | /plans stat cards render but show static zeros. | Replace with live DB aggregates. |
| 17 | MEDIUM | /agents/capabilities returns 404. Dashboard card links to missing page. | Create the route or remove the link. |
| 10 | LOW | /.well-known/agent.json returns HTML instead of Agent Card JSON. | Add static agent.json to public directory or API route. |
Resolved
| # | Resolved | Evidence |
|---|---|---|
| 12 | 2026-03-07 | /agents loads with Registry, Workflows, Standards, Register Agent button. |
| 7 | 2026-03-07 | team-activation.sh uses git branch --show-current. No stale branch. |
| 8 | 2026-03-07 | Hook degrades gracefully. timeout 5 + fallback in place. |
| 9 | 2026-03-08 | Spec updated to match WM-NAV.md + working-memory.json. |
Progress
Scorecard
Priority Score: 1500 (Pain 5 x Demand 5 x Edge 4 x Trend 5 x Conversion 3)
| # | Priority (should we?) | Preparedness (can we?) |
|---|---|---|
| 1 | Pain: 5 — 5min re-briefing + 8min/issue + 0 at L4 | Principles: 4 — five concerns defined, consolidation done |
| 2 | Demand: 5 — every agent needs identity, memory, comms, dispatch, verification | Performance: 2 — 61% capabilities at zero |
| 3 | Edge: 4 — self-orchestrating platform with settlement verification | Platform: 4 — 3 CLIs built, Convex deployed, session boot functional |
| 4 | Trend: 5 — A2A protocol, autonomous agents, multi-repo orchestration | Protocols: 3 — 7 phases defined, Phase 0 complete |
| 5 | Conversion: 3 — internal infra that unblocks all ventures, indirect but mandatory | Players: 3 — 6 agents specified, 8 instruments named |
| Metric | Target | Now |
|---|---|---|
| Session recovery time | <30s | <10s (Phase 0 done) |
| Flow efficiency | >15% | 4.3% |
| Capabilities at L2+ | >60% | 39% (7/18) |
| Capabilities at L4 | >0 | 0 |
| Scope | Phases | What You Get |
|---|---|---|
| Phase 0 | 0 | Session boot — DONE |
| MVP | 0-1 | + drmg CLI + 8 VVFL auditors |
| V1 | 0-2 | + Issue CLI + Plans UI + priority dispatch |
| Quality | 0-3 | + receipts, hook tracker, scaffolds, maturity scorer |
| Settlement | 0-4 | + settlement verification, trust ledger, /loop |
| Learning | 0-5 | + pattern extraction, memory writing, action generation |
| API + A2A | 0-6 | + HTTP transport, Agent Cards, external mesh |
Kill date: none — mandatory infrastructure.
Kill signals:
- If agents with memory aren't measurably faster than agents without it, the memory is noise
- If compiler takes >10s or injects >2000 tokens, pare down scope
- Engineering still edits
issues-log.mdmanually 2 sprints after CLI ships - If settlement verification adds >30s to each commit cycle, simplify scope
Blocks: All agent instances (Sales Dev, Content Amplifier, future ventures).
Context
- Nav Continuity Spec — Phase 0 engineering spec
- Agent Project Mgmt Spec — Phase 2 engineering spec (absorbed)
- Hemisphere Bridge Spec — Phase 4 engineering spec
- Phase 1 Engineering Spec — 8-dimension auditors and audit output schema
- Hemisphere Bridge Pictures — A&ID diagrams + gap matrices
- Scoreboard — Settlement integrity as primary structure
- Game Loops — Four nested loops: rendering, gameplay, core, meta
- Commissioning Dashboard — 188 capabilities tracked
- Issues Log — Current file-based implementation (replaced by Phase 2)
Questions
What is the minimum platform surface that makes every subsequent PRD faster to build?
- If Phase 0 eliminated re-briefing, which remaining friction is the next 80/20 win?
- At what point does settlement verification become more valuable than building new features?
- When an issue sits unrouted for a sprint, is that a tooling problem or a process problem?
- How do you measure whether the quality loop (Phase 3) is preventing defects or just catching them?