Skip to main content

← Agent Platform · Prompt Deck · Pictures

Agent Platform Phase 1 Spec

What do the 8 dimensions actually measure — and what does a healthy reading look like?

The 8 Dimensions

The VVFL Dashboard instrument reads enforcement health across 8 dimensions. Each dimension is an auditor — a function that takes evidence and returns findings.

#DimensionMeasuresInput SourceHealthy SignalUnhealthy Signal
1GeneratorGenerated code correctnessGenerated files vs schema definitionsZero incidents in generated codeBugs trace to generator output
2TemplatePlan template sequence enforcementplan.json ordering, prdRef, bookendsNo phases skipped, prdRef populatedTasks out of order, empty prdRef
3RuleRules followed when loaded.claude/rules/ coverage vs incidentsRule-covered incidents near zeroRules exist but violations recur
4SkillSkills invoked when relevantTrigger conditions vs invocation countInvocation matches triggersSkills exist but never invoked
5AgentAgents stay within boundariesOutput vs declared autonomy scopeZero out-of-scope changesEdits outside blast radius
6PlatformInfrastructure prevents violationsHook fire count vs violations shippedHooks catch before commitViolations reach CI
7VirtueLoop improves over timeEnforcement tier distribution trendMore caught by generators, less by expertiseExpertise catches staying flat
8PatternPatterns extracted and codifiedRepeated incidents vs new preventionEvery 2x pattern becomes preventionSame error class appears 3+ times

Source: A&ID Instrument Registry — VVFL Dashboard row (line 151).

Dimension Detail

1. Generator

FieldValue
InputDiff of generated files against schema definitions
MeasurementCount of incidents where root cause is generator output
Output{ dimension: "generator", finding: "...", evidence: { file, line } }
RoutingFix the generator, not the generated code

2. Template

FieldValue
Inputplan.json phase ordering, prdRef field, bookend presence
MeasurementPlans with skipped phases, empty prdRef, missing bookends
Output{ dimension: "template", finding: "...", evidence: { file, expected, actual } }
RoutingUpdate plan template gates in template.json

3. Rule

FieldValue
Input.claude/rules/ directory vs incidents in rule-covered areas
MeasurementIncidents where a rule exists but wasn't followed
Output{ dimension: "rule", finding: "...", evidence: { rule_file, incident } }
RoutingIf recurring: escalate to hook. If ambiguous: rewrite rule for clarity

4. Skill

FieldValue
InputSkill trigger conditions vs actual invocation count
MeasurementSituations where a skill should have fired but didn't
Output{ dimension: "skill", finding: "...", evidence: { skill_name, trigger, missed } }
RoutingImprove trigger visibility or convert to hook

5. Agent

FieldValue
InputAgent output files vs declared autonomy scope in agent definition
MeasurementEdits to files outside the agent's declared blast radius
Output{ dimension: "agent", finding: "...", evidence: { agent, file, scope } }
RoutingTighten agent definition or expand scope with justification

6. Platform

FieldValue
InputHook fire count vs violations that reached commit or CI
MeasurementRatio of caught-at-hook vs escaped-to-CI
Output{ dimension: "platform", finding: "...", evidence: { hook, violation } }
RoutingAdd or fix hook. Every CI-caught violation = missing hook

7. Virtue

FieldValue
InputHistorical enforcement tier distribution over time
MeasurementTrend: are more incidents caught by higher tiers (generator, template) vs lower (expertise)?
Output{ dimension: "virtue", finding: "...", evidence: { period, distribution } }
RoutingIf flat: enforcement push-up isn't working. Review retrospectives

8. Pattern

FieldValue
InputIncident history — same error class appearing more than once
MeasurementCount of repeated error classes without structural prevention
Output{ dimension: "pattern", finding: "...", evidence: { error_class, count, prevention } }
Routing2x = create prevention artifact. 3x = escalate to generator

Audit Output Schema

Every auditor produces findings in this shape. Forward-compatible with Phase 3 receipts.

{
"dimension": "generator|template|rule|skill|agent|platform|virtue|pattern",
"severity": "info|warning|critical",
"finding": "Human-readable description",
"evidence": {
"file": "path/to/file",
"line": 42,
"expected": "what should be there",
"actual": "what is there",
},
"routing": {
"action": "fix|escalate|create",
"target": "path/to/artifact",
"owner": "role or team",
},
"gap_type": "gate-bypass|template-bloat|sequence-violation|interface-drift|demand-absence",
}

Gaps to Dimensions

The five engineering gaps map to dimensions that catch them.

Gap TypePrimary DimensionSecondary DimensionDetection Method
Gate bypassTemplateRuleEmpty prdRef, missing bookends in plan.json
Template bloatGeneratorTemplateMechanical tasks consuming plan slots
Sequence violationGeneratorTemplateE2E tests before UI, retrofitted testids
Interface driftGeneratorPatternEnum count mismatch across definition sites
Demand absenceRuleTemplatePlan created without prdRef or Tight Five ref

Story Contract

Stories are test contracts. Tests must be RED before implementation starts. GREEN = value delivered.


S1 — Platform auditor reads real hook failures

Trigger: drmg audit --dry-run runs against a repo where .claude/hooks/failure-log.jsonl contains ≥1 failure entry

Checklist:

  • Output JSON has findings[] with ≥1 entry where dimension: "platform"
  • evidence.source references failure-log.jsonl (not a hardcoded stub)
  • severity is warning or critical — not info
  • Test seeds a real failure-log.jsonl and reads it — hardcoded stub does not pass

Forbidden: Stub returning hardcoded findings passes. Auditor produces findings without reading failure-log.jsonl.

Evidence: integration — drmg/__tests__/story-s1-platform.spec.ts

Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)


S2 — Routing fields resolve to real artifacts

Trigger: An audit finding is produced with a routing field

Checklist:

  • finding.routing.target resolves to an existing file path on disk
  • finding.routing.owner matches a value in the AssignedTeam enum
  • Finding with routing.target: "" or routing.owner: "unknown" is rejected as invalid

Forbidden: Finding with empty target or "unknown" owner accepted as valid.

Evidence: unit — drmg/__tests__/story-s2-routing.spec.ts

Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)


S3 — Template auditor flags null prdRef as critical

Trigger: A plan record exists in DB with prdRef: null

Checklist:

  • auditTemplate([plan]) returns ≥1 finding where severity: "critical"
  • evidence.field: "prdRef" named in the finding
  • evidence.actual: null present
  • Plan with null prdRef does not produce zero findings or info-only findings

Forbidden: Plan with null prdRef produces no finding, or only info severity.

Evidence: integration — drmg/__tests__/story-s3-template.spec.ts

Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)


S4 — Generator auditor detects enum drift

Trigger: pgEnum in schema has 3 values; TypeScript union in generated types has 5 values

Checklist:

  • auditGenerator(input) returns ≥1 finding with evidence.pgEnumCount: 3
  • evidence.tsUnionCount: 5 present in the finding
  • evidence.enumName names the specific enum that drifted
  • Mismatch does not produce empty findings[]

Forbidden: Mismatch exists but auditor returns empty findings[].

Evidence: unit — drmg/__tests__/story-s4-generator.spec.ts

Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)


S5 — Virtue auditor proves the loop is improving

Trigger: 3 prior audit result records exist in DB with increasing generator tier catches across runs 1→2→3

Checklist:

  • auditVirtue(runs) returns finding where evidence.trend: "improving"
  • evidence.periods contains data from all 3 runs
  • Auditor with 0 or 1 run in DB produces no non-empty finding — requires ≥3 runs
  • Flat or declining trend does not return "improving"

Forbidden: Virtue auditor runs with 0 or 1 run and returns a non-empty finding.

Evidence: integration — drmg/__tests__/story-s5-virtue.spec.ts

Commission Result: ⬜ PASS / ⬜ FAIL Notes: (findings)

Build Contract

Success Test = the test in the Story Contract that goes GREEN when this row is done. Safety Test = what must NOT happen.

#IDFunctionArtifactSuccess Test (Story ref + specific assertion)Safety Test (Forbidden Outcome)ValueState
1AGNT-001Shared DB context (reuse plan-cli pattern)db-context.tscreateDbContext() returns object with typed .db property — getDiagnostics: 0 errorsEach drmg module defines its own DB connectionNo per-module DB setupGap
2AGNT-001Thin router dispatches to handlersdrmg.ts entry pointdrmg audit invokes commands/audit.ts; drmg plan invokes commands/plan.ts — verified via integration test with mock handlersUnknown subcommand silently does nothing instead of printing help + exit 1One CLI, many commandsGap
3AGNT-002Generator auditorauditors/generator.tsS4: seed pgEnum=3 values, TS union=5 values → auditGenerator() returns finding with evidence.pgEnumCount:3, tsUnionCount:5Returns empty findings when enum mismatch existsCatch generator bugs at sourceGap
4AGNT-002Template auditorauditors/template.tsS3: plan with prdRef:nullauditTemplate() returns finding with severity:"critical", evidence.field:"prdRef"Plan with null prdRef produces no finding or produces only info severityEnforce plan disciplineGap
5AGNT-002Rule auditorauditors/rule.tsIncident matching a rule in .claude/rules/auditRules() returns finding with evidence.rule_file naming the covering ruleIncident in rule-covered area produces no findingMeasure rule effectivenessGap
6AGNT-002Skill auditorauditors/skill.tsSession with trigger condition matched but skill not invoked → auditSkills() returns finding with evidence.trigger namedReturns empty findings when totalSessions: 0 — must produce info-level data gapKnow what skills aren't pulling weightGap
7AGNT-002Agent auditorauditors/agent.tsChanged file outside declared blast radius → auditAgent() returns finding with evidence.file and evidence.scope both namedChanged files that violate scope produce no findingEnforce agent boundariesGap
8AGNT-002Platform auditorauditors/platform.tsS1: seed failure-log.jsonl with ≥1 entry → auditPlatform() returns finding where evidence.source references failure-log.jsonlReturns empty findings when failure-log.jsonl has entries — stub not wiredEvery CI failure = missing hookGap
9AGNT-002Virtue auditorauditors/virtue.tsS5: 3 audit runs in DB with improving generator tier → auditVirtue() returns finding with evidence.trend:"improving"Runs with <3 records produce a non-empty finding (must require ≥3 runs)Prove the loop improvesGap
10AGNT-002Pattern auditorauditors/pattern.ts3 incidents of same error class → auditPattern() returns finding with severity:"critical" and gap_type:"gate-bypass"Same error class at 3x count produces no finding, or produces gap_type:"interface-drift"No class recurs without structureGap
11AGNT-002Audit command with --dry-runcommands/audit.tsS1+S3+S4: with seeded test data, drmg audit --dry-run outputs JSON with findings from platform, template, and generator dimensionsOutputs valid JSON with zero findings when test data is seededOne command, full health pictureGap
12PLAT-005DB-native plan template tablesschema migration + seedSELECT COUNT(*) FROM planning_task_templates WHERE best_pattern_prompt IS NULL = 0 AND SELECT COUNT(*) FROM planning_plan_templates = 33Template created with best_pattern_prompt: null is accepted by DB insertSchema enforces prompt quality — AGNT-007 writes to DB not JSONGap
13PLAT-005plan-cli.ts create reads from DBplan-cli.tsplan-cli.ts create --template=a2a-api-intent-validation --dry-run succeeds with no JSON file on disk — tasks include bestPatternPrompt from DBFalls back to JSON when DB record existsOne source of truth for templatesGap

Context

Questions

When the virtue dimension shows a flat trend, is the problem the retrospectives or the routing?

  • If a generator auditor finds zero issues, does that mean the generator is perfect — or that the auditor's input source is wrong?
  • At what point does an 8-dimension audit become overhead rather than prevention?
  • Which dimension catches the most findings in the first 5 runs — and does that reveal the weakest enforcement tier?