Agent Platform Phase 1 Spec
What do the 8 dimensions actually measure — and what does a healthy reading look like?
The 8 Dimensions
The VVFL Dashboard instrument reads enforcement health across 8 dimensions. Each dimension is an auditor — a function that takes evidence and returns findings.
| # | Dimension | Measures | Input Source | Healthy Signal | Unhealthy Signal |
|---|---|---|---|---|---|
| 1 | Generator | Generated code correctness | Generated files vs schema definitions | Zero incidents in generated code | Bugs trace to generator output |
| 2 | Template | Plan template sequence enforcement | plan.json ordering, prdRef, bookends | No phases skipped, prdRef populated | Tasks out of order, empty prdRef |
| 3 | Rule | Rules followed when loaded | .claude/rules/ coverage vs incidents | Rule-covered incidents near zero | Rules exist but violations recur |
| 4 | Skill | Skills invoked when relevant | Trigger conditions vs invocation count | Invocation matches triggers | Skills exist but never invoked |
| 5 | Agent | Agents stay within boundaries | Output vs declared autonomy scope | Zero out-of-scope changes | Edits outside blast radius |
| 6 | Platform | Infrastructure prevents violations | Hook fire count vs violations shipped | Hooks catch before commit | Violations reach CI |
| 7 | Virtue | Loop improves over time | Enforcement tier distribution trend | More caught by generators, less by expertise | Expertise catches staying flat |
| 8 | Pattern | Patterns extracted and codified | Repeated incidents vs new prevention | Every 2x pattern becomes prevention | Same error class appears 3+ times |
Source: A&ID Instrument Registry — VVFL Dashboard row (line 151).
Dimension Detail
1. Generator
| Field | Value |
|---|---|
| Input | Diff of generated files against schema definitions |
| Measurement | Count of incidents where root cause is generator output |
| Output | { dimension: "generator", finding: "...", evidence: { file, line } } |
| Routing | Fix the generator, not the generated code |
2. Template
| Field | Value |
|---|---|
| Input | plan.json phase ordering, prdRef field, bookend presence |
| Measurement | Plans with skipped phases, empty prdRef, missing bookends |
| Output | { dimension: "template", finding: "...", evidence: { file, expected, actual } } |
| Routing | Update plan template gates in template.json |
3. Rule
| Field | Value |
|---|---|
| Input | .claude/rules/ directory vs incidents in rule-covered areas |
| Measurement | Incidents where a rule exists but wasn't followed |
| Output | { dimension: "rule", finding: "...", evidence: { rule_file, incident } } |
| Routing | If recurring: escalate to hook. If ambiguous: rewrite rule for clarity |
4. Skill
| Field | Value |
|---|---|
| Input | Skill trigger conditions vs actual invocation count |
| Measurement | Situations where a skill should have fired but didn't |
| Output | { dimension: "skill", finding: "...", evidence: { skill_name, trigger, missed } } |
| Routing | Improve trigger visibility or convert to hook |
5. Agent
| Field | Value |
|---|---|
| Input | Agent output files vs declared autonomy scope in agent definition |
| Measurement | Edits to files outside the agent's declared blast radius |
| Output | { dimension: "agent", finding: "...", evidence: { agent, file, scope } } |
| Routing | Tighten agent definition or expand scope with justification |
6. Platform
| Field | Value |
|---|---|
| Input | Hook fire count vs violations that reached commit or CI |
| Measurement | Ratio of caught-at-hook vs escaped-to-CI |
| Output | { dimension: "platform", finding: "...", evidence: { hook, violation } } |
| Routing | Add or fix hook. Every CI-caught violation = missing hook |
7. Virtue
| Field | Value |
|---|---|
| Input | Historical enforcement tier distribution over time |
| Measurement | Trend: are more incidents caught by higher tiers (generator, template) vs lower (expertise)? |
| Output | { dimension: "virtue", finding: "...", evidence: { period, distribution } } |
| Routing | If flat: enforcement push-up isn't working. Review retrospectives |
8. Pattern
| Field | Value |
|---|---|
| Input | Incident history — same error class appearing more than once |
| Measurement | Count of repeated error classes without structural prevention |
| Output | { dimension: "pattern", finding: "...", evidence: { error_class, count, prevention } } |
| Routing | 2x = create prevention artifact. 3x = escalate to generator |
Audit Output Schema
Every auditor produces findings in this shape. Forward-compatible with Phase 3 receipts.
{
"dimension": "generator|template|rule|skill|agent|platform|virtue|pattern",
"severity": "info|warning|critical",
"finding": "Human-readable description",
"evidence": {
"file": "path/to/file",
"line": 42,
"expected": "what should be there",
"actual": "what is there",
},
"routing": {
"action": "fix|escalate|create",
"target": "path/to/artifact",
"owner": "role or team",
},
"gap_type": "gate-bypass|template-bloat|sequence-violation|interface-drift|demand-absence",
}
Gaps to Dimensions
The five engineering gaps map to dimensions that catch them.
| Gap Type | Primary Dimension | Secondary Dimension | Detection Method |
|---|---|---|---|
| Gate bypass | Template | Rule | Empty prdRef, missing bookends in plan.json |
| Template bloat | Generator | Template | Mechanical tasks consuming plan slots |
| Sequence violation | Generator | Template | E2E tests before UI, retrofitted testids |
| Interface drift | Generator | Pattern | Enum count mismatch across definition sites |
| Demand absence | Rule | Template | Plan created without prdRef or Tight Five ref |
Story Contract
Stories are test contracts. Each row is converted to ≥1 test file by engineering. Tests must be RED before implementation starts. Tests going GREEN = value delivered.
| # | WHEN (Trigger + Precondition) | THEN (Exact Assertion — names data source, field, threshold) | ARTIFACT (Test File) | Test Type | FORBIDDEN (Must not happen) | OUTCOME (Value Proven) |
| --- | ------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------- | ------------------------------------------ | ------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------- |
| S1 | drmg audit --dry-run runs against a repo where .claude/hooks/failure-log.jsonl contains ≥1 failure entry | Output JSON has findings[] where ≥1 entry has dimension: "platform" AND evidence.source references failure-log.jsonl AND severity is warning | critical | drmg/__tests__/story-s1-platform.spec.ts | integration | A stub returning hardcoded findings passes — test must seed a real failure-log.jsonl and read it | Platform auditor reads actual hook failure data, not empty stubs |
| S2 | An audit finding is produced with a routing field | finding.routing.target resolves to an existing file path on disk AND finding.routing.owner matches a value in the AssignedTeam enum | drmg/__tests__/story-s2-routing.spec.ts | unit | Finding with routing.target: "" or routing.owner: "unknown" is accepted as valid | Findings route to real artifacts, not phantom paths |
| S3 | A plan record exists in DB with prdRef: null | auditTemplate([plan]) returns ≥1 finding where severity: "critical" AND evidence.field: "prdRef" AND evidence.actual: null | drmg/__tests__/story-s3-template.spec.ts | integration | A plan with null prdRef produces no finding, or produces an info finding only | Template auditor enforces prdRef is populated before plan runs |
| S4 | A pgEnum in the schema has 3 values; the corresponding TypeScript union in generated types has 5 values | auditGenerator(input) returns ≥1 finding where evidence.pgEnumCount: 3 AND evidence.tsUnionCount: 5 AND evidence.enumName names the specific enum | drmg/__tests__/story-s4-generator.spec.ts | unit | Mismatch exists but auditor returns empty findings[] | Generator auditor detects drift between schema and generated types |
| S5 | 3 prior audit result records exist in DB with increasing generator tier catches across runs 1→2→3 | auditVirtue(runs) returns finding where evidence.trend: "improving" AND evidence.periods contains data from all 3 runs | drmg/__tests__/story-s5-virtue.spec.ts | integration | Virtue auditor runs with 0 or 1 run in DB and returns a non-empty finding (must require ≥3 runs) | Virtue auditor proves the enforcement loop is improving over time |
Build Contract
Success Test = the test in the Story Contract that goes GREEN when this row is done. Safety Test = what must NOT happen.
| # | ID | Function | Artifact | Success Test (Story ref + specific assertion) | Safety Test (Forbidden Outcome) | Value | State |
|---|---|---|---|---|---|---|---|
| 1 | AGNT-001 | Shared DB context (reuse plan-cli pattern) | db-context.ts | createDbContext() returns object with typed .db property — getDiagnostics: 0 errors | Each drmg module defines its own DB connection | No per-module DB setup | Gap |
| 2 | AGNT-001 | Thin router dispatches to handlers | drmg.ts entry point | drmg audit invokes commands/audit.ts; drmg plan invokes commands/plan.ts — verified via integration test with mock handlers | Unknown subcommand silently does nothing instead of printing help + exit 1 | One CLI, many commands | Gap |
| 3 | AGNT-002 | Generator auditor | auditors/generator.ts | S4: seed pgEnum=3 values, TS union=5 values → auditGenerator() returns finding with evidence.pgEnumCount:3, tsUnionCount:5 | Returns empty findings when enum mismatch exists | Catch generator bugs at source | Gap |
| 4 | AGNT-002 | Template auditor | auditors/template.ts | S3: plan with prdRef:null → auditTemplate() returns finding with severity:"critical", evidence.field:"prdRef" | Plan with null prdRef produces no finding or produces only info severity | Enforce plan discipline | Gap |
| 5 | AGNT-002 | Rule auditor | auditors/rule.ts | Incident matching a rule in .claude/rules/ → auditRules() returns finding with evidence.rule_file naming the covering rule | Incident in rule-covered area produces no finding | Measure rule effectiveness | Gap |
| 6 | AGNT-002 | Skill auditor | auditors/skill.ts | Session with trigger condition matched but skill not invoked → auditSkills() returns finding with evidence.trigger named | Returns empty findings when totalSessions: 0 — must produce info-level data gap | Know what skills aren't pulling weight | Gap |
| 7 | AGNT-002 | Agent auditor | auditors/agent.ts | Changed file outside declared blast radius → auditAgent() returns finding with evidence.file and evidence.scope both named | Changed files that violate scope produce no finding | Enforce agent boundaries | Gap |
| 8 | AGNT-002 | Platform auditor | auditors/platform.ts | S1: seed failure-log.jsonl with ≥1 entry → auditPlatform() returns finding where evidence.source references failure-log.jsonl | Returns empty findings when failure-log.jsonl has entries — stub not wired | Every CI failure = missing hook | Gap |
| 9 | AGNT-002 | Virtue auditor | auditors/virtue.ts | S5: 3 audit runs in DB with improving generator tier → auditVirtue() returns finding with evidence.trend:"improving" | Runs with <3 records produce a non-empty finding (must require ≥3 runs) | Prove the loop improves | Gap |
| 10 | AGNT-002 | Pattern auditor | auditors/pattern.ts | 3 incidents of same error class → auditPattern() returns finding with severity:"critical" and gap_type:"gate-bypass" | Same error class at 3x count produces no finding, or produces gap_type:"interface-drift" | No class recurs without structure | Gap |
| 11 | AGNT-002 | Audit command with --dry-run | commands/audit.ts | S1+S3+S4: with seeded test data, drmg audit --dry-run outputs JSON with findings from platform, template, and generator dimensions | Outputs valid JSON with zero findings when test data is seeded | One command, full health picture | Gap |
| 12 | PLAT-005 | DB-native plan template tables | schema migration + seed | SELECT COUNT(*) FROM planning_task_templates WHERE best_pattern_prompt IS NULL = 0 AND SELECT COUNT(*) FROM planning_plan_templates = 33 | Template created with best_pattern_prompt: null is accepted by DB insert | Schema enforces prompt quality — AGNT-007 writes to DB not JSON | Gap |
| 13 | PLAT-005 | plan-cli.ts create reads from DB | plan-cli.ts | plan-cli.ts create --template=a2a-api-intent-validation --dry-run succeeds with no JSON file on disk — tasks include bestPatternPrompt from DB | Falls back to JSON when DB record exists | One source of truth for templates | Gap |
Context
- A&ID Instrument Registry — Source of truth for the 8 dimension names
- Flow Engineering — Enforcement hierarchy and cost of quality
- Retrospective Protocol — Five gap types and routing logic
- VVFL — Standards station = gauge
- VVFL Evolution — Reflect station = controller
- Agent Platform PRD — Parent PRD with full phase map
Questions
When the virtue dimension shows a flat trend, is the problem the retrospectives or the routing?
- If a generator auditor finds zero issues, does that mean the generator is perfect — or that the auditor's input source is wrong?
- At what point does an 8-dimension audit become overhead rather than prevention?
- Which dimension catches the most findings in the first 5 runs — and does that reveal the weakest enforcement tier?