Skip to main content

Agent Platform Phase 1 Spec

What do the 8 dimensions actually measure — and what does a healthy reading look like?

The 8 Dimensions

The VVFL Dashboard instrument reads enforcement health across 8 dimensions. Each dimension is an auditor — a function that takes evidence and returns findings.

#DimensionMeasuresInput SourceHealthy SignalUnhealthy Signal
1GeneratorGenerated code correctnessGenerated files vs schema definitionsZero incidents in generated codeBugs trace to generator output
2TemplatePlan template sequence enforcementplan.json ordering, prdRef, bookendsNo phases skipped, prdRef populatedTasks out of order, empty prdRef
3RuleRules followed when loaded.claude/rules/ coverage vs incidentsRule-covered incidents near zeroRules exist but violations recur
4SkillSkills invoked when relevantTrigger conditions vs invocation countInvocation matches triggersSkills exist but never invoked
5AgentAgents stay within boundariesOutput vs declared autonomy scopeZero out-of-scope changesEdits outside blast radius
6PlatformInfrastructure prevents violationsHook fire count vs violations shippedHooks catch before commitViolations reach CI
7VirtueLoop improves over timeEnforcement tier distribution trendMore caught by generators, less by expertiseExpertise catches staying flat
8PatternPatterns extracted and codifiedRepeated incidents vs new preventionEvery 2x pattern becomes preventionSame error class appears 3+ times

Source: A&ID Instrument Registry — VVFL Dashboard row (line 151).

Dimension Detail

1. Generator

FieldValue
InputDiff of generated files against schema definitions
MeasurementCount of incidents where root cause is generator output
Output{ dimension: "generator", finding: "...", evidence: { file, line } }
RoutingFix the generator, not the generated code

2. Template

FieldValue
Inputplan.json phase ordering, prdRef field, bookend presence
MeasurementPlans with skipped phases, empty prdRef, missing bookends
Output{ dimension: "template", finding: "...", evidence: { file, expected, actual } }
RoutingUpdate plan template gates in template.json

3. Rule

FieldValue
Input.claude/rules/ directory vs incidents in rule-covered areas
MeasurementIncidents where a rule exists but wasn't followed
Output{ dimension: "rule", finding: "...", evidence: { rule_file, incident } }
RoutingIf recurring: escalate to hook. If ambiguous: rewrite rule for clarity

4. Skill

FieldValue
InputSkill trigger conditions vs actual invocation count
MeasurementSituations where a skill should have fired but didn't
Output{ dimension: "skill", finding: "...", evidence: { skill_name, trigger, missed } }
RoutingImprove trigger visibility or convert to hook

5. Agent

FieldValue
InputAgent output files vs declared autonomy scope in agent definition
MeasurementEdits to files outside the agent's declared blast radius
Output{ dimension: "agent", finding: "...", evidence: { agent, file, scope } }
RoutingTighten agent definition or expand scope with justification

6. Platform

FieldValue
InputHook fire count vs violations that reached commit or CI
MeasurementRatio of caught-at-hook vs escaped-to-CI
Output{ dimension: "platform", finding: "...", evidence: { hook, violation } }
RoutingAdd or fix hook. Every CI-caught violation = missing hook

7. Virtue

FieldValue
InputHistorical enforcement tier distribution over time
MeasurementTrend: are more incidents caught by higher tiers (generator, template) vs lower (expertise)?
Output{ dimension: "virtue", finding: "...", evidence: { period, distribution } }
RoutingIf flat: enforcement push-up isn't working. Review retrospectives

8. Pattern

FieldValue
InputIncident history — same error class appearing more than once
MeasurementCount of repeated error classes without structural prevention
Output{ dimension: "pattern", finding: "...", evidence: { error_class, count, prevention } }
Routing2x = create prevention artifact. 3x = escalate to generator

Audit Output Schema

Every auditor produces findings in this shape. Forward-compatible with Phase 3 receipts.

{
"dimension": "generator|template|rule|skill|agent|platform|virtue|pattern",
"severity": "info|warning|critical",
"finding": "Human-readable description",
"evidence": {
"file": "path/to/file",
"line": 42,
"expected": "what should be there",
"actual": "what is there",
},
"routing": {
"action": "fix|escalate|create",
"target": "path/to/artifact",
"owner": "role or team",
},
"gap_type": "gate-bypass|template-bloat|sequence-violation|interface-drift|demand-absence",
}

Gaps to Dimensions

The five engineering gaps map to dimensions that catch them.

Gap TypePrimary DimensionSecondary DimensionDetection Method
Gate bypassTemplateRuleEmpty prdRef, missing bookends in plan.json
Template bloatGeneratorTemplateMechanical tasks consuming plan slots
Sequence violationGeneratorTemplateE2E tests before UI, retrofitted testids
Interface driftGeneratorPatternEnum count mismatch across definition sites
Demand absenceRuleTemplatePlan created without prdRef or Tight Five ref

Story Contract

Stories are test contracts. Each row is converted to ≥1 test file by engineering. Tests must be RED before implementation starts. Tests going GREEN = value delivered.

| # | WHEN (Trigger + Precondition) | THEN (Exact Assertion — names data source, field, threshold) | ARTIFACT (Test File) | Test Type | FORBIDDEN (Must not happen) | OUTCOME (Value Proven) | | --- | ------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------- | ------------------------------------------ | ------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------- | | S1 | drmg audit --dry-run runs against a repo where .claude/hooks/failure-log.jsonl contains ≥1 failure entry | Output JSON has findings[] where ≥1 entry has dimension: "platform" AND evidence.source references failure-log.jsonl AND severity is warning | critical | drmg/__tests__/story-s1-platform.spec.ts | integration | A stub returning hardcoded findings passes — test must seed a real failure-log.jsonl and read it | Platform auditor reads actual hook failure data, not empty stubs | | S2 | An audit finding is produced with a routing field | finding.routing.target resolves to an existing file path on disk AND finding.routing.owner matches a value in the AssignedTeam enum | drmg/__tests__/story-s2-routing.spec.ts | unit | Finding with routing.target: "" or routing.owner: "unknown" is accepted as valid | Findings route to real artifacts, not phantom paths | | S3 | A plan record exists in DB with prdRef: null | auditTemplate([plan]) returns ≥1 finding where severity: "critical" AND evidence.field: "prdRef" AND evidence.actual: null | drmg/__tests__/story-s3-template.spec.ts | integration | A plan with null prdRef produces no finding, or produces an info finding only | Template auditor enforces prdRef is populated before plan runs | | S4 | A pgEnum in the schema has 3 values; the corresponding TypeScript union in generated types has 5 values | auditGenerator(input) returns ≥1 finding where evidence.pgEnumCount: 3 AND evidence.tsUnionCount: 5 AND evidence.enumName names the specific enum | drmg/__tests__/story-s4-generator.spec.ts | unit | Mismatch exists but auditor returns empty findings[] | Generator auditor detects drift between schema and generated types | | S5 | 3 prior audit result records exist in DB with increasing generator tier catches across runs 1→2→3 | auditVirtue(runs) returns finding where evidence.trend: "improving" AND evidence.periods contains data from all 3 runs | drmg/__tests__/story-s5-virtue.spec.ts | integration | Virtue auditor runs with 0 or 1 run in DB and returns a non-empty finding (must require ≥3 runs) | Virtue auditor proves the enforcement loop is improving over time |

Build Contract

Success Test = the test in the Story Contract that goes GREEN when this row is done. Safety Test = what must NOT happen.

#IDFunctionArtifactSuccess Test (Story ref + specific assertion)Safety Test (Forbidden Outcome)ValueState
1AGNT-001Shared DB context (reuse plan-cli pattern)db-context.tscreateDbContext() returns object with typed .db property — getDiagnostics: 0 errorsEach drmg module defines its own DB connectionNo per-module DB setupGap
2AGNT-001Thin router dispatches to handlersdrmg.ts entry pointdrmg audit invokes commands/audit.ts; drmg plan invokes commands/plan.ts — verified via integration test with mock handlersUnknown subcommand silently does nothing instead of printing help + exit 1One CLI, many commandsGap
3AGNT-002Generator auditorauditors/generator.tsS4: seed pgEnum=3 values, TS union=5 values → auditGenerator() returns finding with evidence.pgEnumCount:3, tsUnionCount:5Returns empty findings when enum mismatch existsCatch generator bugs at sourceGap
4AGNT-002Template auditorauditors/template.tsS3: plan with prdRef:nullauditTemplate() returns finding with severity:"critical", evidence.field:"prdRef"Plan with null prdRef produces no finding or produces only info severityEnforce plan disciplineGap
5AGNT-002Rule auditorauditors/rule.tsIncident matching a rule in .claude/rules/auditRules() returns finding with evidence.rule_file naming the covering ruleIncident in rule-covered area produces no findingMeasure rule effectivenessGap
6AGNT-002Skill auditorauditors/skill.tsSession with trigger condition matched but skill not invoked → auditSkills() returns finding with evidence.trigger namedReturns empty findings when totalSessions: 0 — must produce info-level data gapKnow what skills aren't pulling weightGap
7AGNT-002Agent auditorauditors/agent.tsChanged file outside declared blast radius → auditAgent() returns finding with evidence.file and evidence.scope both namedChanged files that violate scope produce no findingEnforce agent boundariesGap
8AGNT-002Platform auditorauditors/platform.tsS1: seed failure-log.jsonl with ≥1 entry → auditPlatform() returns finding where evidence.source references failure-log.jsonlReturns empty findings when failure-log.jsonl has entries — stub not wiredEvery CI failure = missing hookGap
9AGNT-002Virtue auditorauditors/virtue.tsS5: 3 audit runs in DB with improving generator tier → auditVirtue() returns finding with evidence.trend:"improving"Runs with <3 records produce a non-empty finding (must require ≥3 runs)Prove the loop improvesGap
10AGNT-002Pattern auditorauditors/pattern.ts3 incidents of same error class → auditPattern() returns finding with severity:"critical" and gap_type:"gate-bypass"Same error class at 3x count produces no finding, or produces gap_type:"interface-drift"No class recurs without structureGap
11AGNT-002Audit command with --dry-runcommands/audit.tsS1+S3+S4: with seeded test data, drmg audit --dry-run outputs JSON with findings from platform, template, and generator dimensionsOutputs valid JSON with zero findings when test data is seededOne command, full health pictureGap
12PLAT-005DB-native plan template tablesschema migration + seedSELECT COUNT(*) FROM planning_task_templates WHERE best_pattern_prompt IS NULL = 0 AND SELECT COUNT(*) FROM planning_plan_templates = 33Template created with best_pattern_prompt: null is accepted by DB insertSchema enforces prompt quality — AGNT-007 writes to DB not JSONGap
13PLAT-005plan-cli.ts create reads from DBplan-cli.tsplan-cli.ts create --template=a2a-api-intent-validation --dry-run succeeds with no JSON file on disk — tasks include bestPatternPrompt from DBFalls back to JSON when DB record existsOne source of truth for templatesGap

Context

Questions

When the virtue dimension shows a flat trend, is the problem the retrospectives or the routing?

  • If a generator auditor finds zero issues, does that mean the generator is perfect — or that the auditor's input source is wrong?
  • At what point does an 8-dimension audit become overhead rather than prevention?
  • Which dimension catches the most findings in the first 5 runs — and does that reveal the weakest enforcement tier?