Skip to main content

Value Stories

Ten stories across three groups. Each story is a test contract — RED before implementation, GREEN when value is delivered. The PUMP powers the factory.

Can I see what's happening?

S1Action
When

Open Plans dashboard. Total plans shows 47 but DB has 23 active. Completed plans counted in active totals. Can't trust the numbers.

I need to

Plans dashboard math matches plan-cli dashboard output exactly. Active/completed/archived counts are correct.

So I get

Dashboard totals match CLI query. Zero discrepancy between UI and DB state.

Not

UI shows different numbers than CLI. Math looks right but counts include wrong statuses.

S2Action
When

Looking at 23 active plans in a flat list. Can't tell which plans belong to which project. Have to open each plan to see its project link.

I need to

Plans grouped by project with completion counts. One glance shows which projects are progressing and which are stalled.

So I get

Plans grouped by project. Each group shows X/Y completed. Ungrouped view still available.

Not

Grouping exists but counts are wrong. Or grouping breaks the ungrouped view.

S3Action
When

Want to see why a plan is stalled. Click the row — nothing happens. Have to copy plan ID, switch to terminal, run plan-cli status.

I need to

Click plan row to see phases and tasks inline. Task status visible. No terminal required for inspection.

So I get

Plan detail page shows phases, tasks, and progress. Accessible from plan list via click.

Not

Detail page exists but doesn't show tasks. Or shows tasks but status is stale.

Are agents staying in bounds?

S4Action
When

Need a new skill file. Copy an existing one, rename it, update 6 fields, fix imports. 15 minutes of mechanical work. Third time this week.

I need to

Run drmg scaffold skill my-skill-name and get a valid skill file with correct structure, imports, and frontmatter.

So I get

Scaffold creates file in correct directory with all required fields populated. File passes skill validation.

Not

File created but in wrong directory. Or missing required frontmatter fields.

S5Action
When

Want to scaffold a hook file. Skill scaffold works but hooks have a different structure. No template for hooks.

I need to

Content-type registry knows skill, hook, agent, command, rule. Each type has its own template and output directory.

So I get

drmg scaffold --list shows all types. drmg scaffold hook pre-edit-gate creates valid hook file.

Not

Registry exists but templates are wrong for some types. Or new types require code changes.

S6Hook
When

Agent edits a file outside its declared responsibility. Dream-team agent writes to engineering repo. No warning, no log, discovered days later.

I need to

Each agent has a scope.json declaring allowed paths. Pre-edit hook checks scope before write. Violation logged.

So I get

Scope violation produces warning before edit completes. Violation logged. Human can override.

Not

Scope check runs but so slow it breaks editing flow. Or override doesn't work, blocking legitimate edits.

S7Action
When

Want to audit which agents are staying in bounds and which are violating scope. No way to see violations across agents or time.

I need to

Scope audit shows violations per agent, per time period. Recurring violations highlighted.

So I get

drmg audit --scope shows clean agents, violating agents, and top violation paths.

Not

Audit runs but only shows current session. No historical view.

Does the loop get smarter?

S8Cron
When

Same agent makes the same mistake three sessions in a row. Each session fixes it. Nobody creates a rule to prevent it.

I need to

Pattern extractor reads audit findings across 3+ runs, groups recurring issues, proposes a hook or rule.

So I get

Recurring pattern detected and surfaced as proposed prevention with evidence.

Not

Patterns detected but proposals are too generic. Or too many proposals creating noise.

S9Action
When

Agent solved a complex problem last week. This week, different agent hits the same problem. No way to recall what worked.

I need to

Agent writes structured memory to DB after significant problem resolution. Future agents query memory before starting similar work.

So I get

Memory written with context, solution, and tags. Recall query returns relevant memories ranked by similarity.

Not

Memory written but too verbose to be useful. Or recall returns irrelevant results.

S10Cron
When

Virtue auditor runs weekly. Shows 8 dimension scores. Scores improve but improvements don't compound — each session starts from scratch.

I need to

Pattern extractor feeds improvement proposals into templates and rules. Virtue scores stay improved structurally.

So I get

Virtue dimension score for a specific gap stays improved for 3+ consecutive audits.

Not

Score improves once then regresses. Prevention applied but doesn't address root cause.

Kill Signal

Boundary hooks block >30% of legitimate agent actions after 30 days. Agent task completion rate drops below current baseline.

Who

  • Platform operator — wants dashboard that matches reality and agents that stay in bounds
  • Agent developer — wants scaffolds that produce valid files and patterns that prevent recurring mistakes
  • Agent — wants to recall what worked before and know its own boundaries before hitting them

Questions

Where does automation end and human judgment begin for agent boundaries?

  • If agents can extract their own patterns, will they converge on the same rules humans would write?
  • Should memory be per-agent or shared across all agents in a session?
  • At what point does scaffold templating become over-engineering — when does an agent just write the file directly?
  • Can boundary violations be the training signal for better scope declarations?