Value Stories
Ten stories across three groups. Each story is a test contract — RED before implementation, GREEN when value is delivered. The PUMP powers the factory.
Can I see what's happening?
Open Plans dashboard. Total plans shows 47 but DB has 23 active. Completed plans counted in active totals. Can't trust the numbers.
Plans dashboard math matches plan-cli dashboard output exactly. Active/completed/archived counts are correct.
Dashboard totals match CLI query. Zero discrepancy between UI and DB state.
UI shows different numbers than CLI. Math looks right but counts include wrong statuses.
Looking at 23 active plans in a flat list. Can't tell which plans belong to which project. Have to open each plan to see its project link.
Plans grouped by project with completion counts. One glance shows which projects are progressing and which are stalled.
Plans grouped by project. Each group shows X/Y completed. Ungrouped view still available.
Grouping exists but counts are wrong. Or grouping breaks the ungrouped view.
Want to see why a plan is stalled. Click the row — nothing happens. Have to copy plan ID, switch to terminal, run plan-cli status.
Click plan row to see phases and tasks inline. Task status visible. No terminal required for inspection.
Plan detail page shows phases, tasks, and progress. Accessible from plan list via click.
Detail page exists but doesn't show tasks. Or shows tasks but status is stale.
Are agents staying in bounds?
Need a new skill file. Copy an existing one, rename it, update 6 fields, fix imports. 15 minutes of mechanical work. Third time this week.
Run drmg scaffold skill my-skill-name and get a valid skill file with correct structure, imports, and frontmatter.
Scaffold creates file in correct directory with all required fields populated. File passes skill validation.
File created but in wrong directory. Or missing required frontmatter fields.
Want to scaffold a hook file. Skill scaffold works but hooks have a different structure. No template for hooks.
Content-type registry knows skill, hook, agent, command, rule. Each type has its own template and output directory.
drmg scaffold --list shows all types. drmg scaffold hook pre-edit-gate creates valid hook file.
Registry exists but templates are wrong for some types. Or new types require code changes.
Agent edits a file outside its declared responsibility. Dream-team agent writes to engineering repo. No warning, no log, discovered days later.
Each agent has a scope.json declaring allowed paths. Pre-edit hook checks scope before write. Violation logged.
Scope violation produces warning before edit completes. Violation logged. Human can override.
Scope check runs but so slow it breaks editing flow. Or override doesn't work, blocking legitimate edits.
Want to audit which agents are staying in bounds and which are violating scope. No way to see violations across agents or time.
Scope audit shows violations per agent, per time period. Recurring violations highlighted.
drmg audit --scope shows clean agents, violating agents, and top violation paths.
Audit runs but only shows current session. No historical view.
Does the loop get smarter?
Same agent makes the same mistake three sessions in a row. Each session fixes it. Nobody creates a rule to prevent it.
Pattern extractor reads audit findings across 3+ runs, groups recurring issues, proposes a hook or rule.
Recurring pattern detected and surfaced as proposed prevention with evidence.
Patterns detected but proposals are too generic. Or too many proposals creating noise.
Agent solved a complex problem last week. This week, different agent hits the same problem. No way to recall what worked.
Agent writes structured memory to DB after significant problem resolution. Future agents query memory before starting similar work.
Memory written with context, solution, and tags. Recall query returns relevant memories ranked by similarity.
Memory written but too verbose to be useful. Or recall returns irrelevant results.
Virtue auditor runs weekly. Shows 8 dimension scores. Scores improve but improvements don't compound — each session starts from scratch.
Pattern extractor feeds improvement proposals into templates and rules. Virtue scores stay improved structurally.
Virtue dimension score for a specific gap stays improved for 3+ consecutive audits.
Score improves once then regresses. Prevention applied but doesn't address root cause.
Kill Signal
Boundary hooks block >30% of legitimate agent actions after 30 days. Agent task completion rate drops below current baseline.
Who
- Platform operator — wants dashboard that matches reality and agents that stay in bounds
- Agent developer — wants scaffolds that produce valid files and patterns that prevent recurring mistakes
- Agent — wants to recall what worked before and know its own boundaries before hitting them
Questions
Where does automation end and human judgment begin for agent boundaries?
- If agents can extract their own patterns, will they converge on the same rules humans would write?
- Should memory be per-agent or shared across all agents in a session?
- At what point does scaffold templating become over-engineering — when does an agent just write the file directly?
- Can boundary violations be the training signal for better scope declarations?