Skip to main content

Outer Loop Validation

How does the outer loop validate reality without touching the code?

The outer loop agent's job is not to build — it's to hold calibration between desired state (this repo) and actual state (engineering). That requires capability: instruments to read reality, protocols to interpret it, and enough situational wisdom to know when to trust the dashboard and when to check.

The Two Loops

DREAM REPO (desired state — this repo)
↕ error signal = the gap
VALIDATION LAYER (MCPs + commissioning API)
↕ reads actual state
ENGINEERING (builds, ships, operates)

The separation is the design. The builder cannot validate their own work — not because of distrust, but because proximity creates bias. The outer loop exists to close the VVFL: the inner loop builds, the outer loop reads truth, the error signal tells both where to go next.

Capability Map

CAPABILITY                              MATURITY     CATEGORY    GAP?
────────────────────────────────────── ──────── ──────── ─────
Read deployed state (Vercel/GitHub) ██░░░ (2) Supporting
Browser-test features (Chrome MCP) ██░░░ (2) Core
Update commissioning dashboard ██░░░ (2) Core
Signal engineering (agent-comms) ██░░░ (2) Supporting
PRD spec + scoring quality ███░░ (3) Core
Read DB schema + data (Supabase MCP) ░░░░░ (0) Core CRITICAL
Read error reality (Sentry MCP) ░░░░░ (0) Core GAP
Read usage reality (PostHog MCP) ░░░░░ (0) Core GAP
Cross-repo commissioning protocol █░░░░ (1) Core GAP
Commissioning API (machine-readable) ░░░░░ (0) Core CRITICAL

Capability = Resources (MCPs/APIs) + Know-How (commissioning vocabulary) + Character (bias separation)

The critical gaps are not knowledge gaps — the commissioning protocol is well-defined. They are instrument gaps: without Supabase, Sentry, and PostHog MCPs, the outer loop reads the dashboard but cannot verify it.

KB → Situational Wisdom Upgrade Path

Knowledge exists. Wisdom requires engineering it into systems that fire automatically.

StageWhat existsWhat's missingPriority
KnowledgeCommissioning L0-L4 protocol, PRD standards, vocabulary
Mantra"The builder never validates their own work"Embedded in commissioning doc
Ruleprd-maintenance.md — check after every PRD changeRelies on memory, not trigger
Hookdocs-post-edit.sh — catches format, not realityNo reality check on dashboard editsMedium
SystemCommissioning API returning machine-readable stateNOT BUILTHigh

The mantra page names the cascade: Mantra → Rule → Hook → System. Flow state is where none of it is conscious. The outer loop reaches flow when the commissioning API makes the gap visible automatically — no manual cross-referencing, no dashboard lag.

Instruments to Add

Already wired in this session:

MCPReadsValidation signal
mcp__claude_ai_Vercel__*Deployments, build logs, runtime errorsIs the app live? What's crashing?
mcp__github__*Commits, PRs, code presenceDoes code exist for what's specced?
mcp__claude-in-chrome__*Live browser sessionDoes the feature work end-to-end?

Install next (priority order):

MCPWhat it readsCommissioning signal
SupabaseDB schema + dataL1 evidence: schema exists. L2 evidence: records populated
SentryProduction errorsReality check: what's actually failing right now
PostHogFeature usageL4 evidence: humans are using it, value is flowing

Install: add to ~/.claude/mcp_settings.json:

"supabase": { "command": "npx", "args": ["@supabase/mcp-server-supabase@latest", "--project-ref", "YOUR_REF"] },
"sentry": { "command": "npx", "args": ["-y", "@sentry/mcp-server@latest"] },
"posthog": { "command": "npx", "args": ["@posthog/mcp@latest"] }

Commission from Engineering

One table in Supabase, one endpoint. Both the outer loop agent and engineering read the same source of truth.

Write Rule

All writes go through Drizzle ORM + repository pattern. No exceptions.

The outer loop agent reads only — via Supabase MCP (direct DB query) or the REST endpoint. Engineering writes via the repo. This is not a trust constraint — it's a quality constraint. Every write through the repo is type-safe, auditable, and consistent. Writing directly via Supabase MCP would bypass every compound-quality guarantee the engineering team has built.

OUTER LOOP (this agent)          ENGINEERING TEAM
│ │
READ only READ + WRITE
│ │
Supabase MCP ─────────────── Drizzle ORM + Repo
REST endpoint │
└───────────────────────── Supabase DB
(source of truth)

Schema to Build

Engineering commissions this table:

// Drizzle schema — commissioning_capabilities
{
id: uuid (PK)
slug: text (unique) // kebab-case, matches commisioning-status.md row
feature_name: text
capability_enabled: text
prd_slug: text // links back to PRD
claimed_level: enum // L0 | L1 | L2 | L3 | L4
evidence: jsonb // { receipts: [], test_results: [], usage: {} }
last_verified: timestamp
verified_by: text // agent name or human
updated_at: timestamp
}

Endpoints to Expose

GET /api/commissioning/status
→ All capabilities: slug, claimed_level, evidence, last_verified

GET /api/commissioning/{slug}/evidence
→ One capability: receipts, test results, usage data

GET /api/health/features
→ Liveness check: which features respond right now

GET /api/vvfl/receipts?since=7d
→ Agent receipts: work done + artifacts + gate results

A2A Path

These REST endpoints are the HTTP stub before the A2A protocol is wired. When A2A is live, commissioning/status becomes a first-class agent task — any agent can query what needs commissioning and post evidence through the same protocol as every other inter-agent workflow. The schema and repo pattern built now carry forward unchanged; only the transport layer upgrades.

The dream repo's commisioning-status.md is the spec. The DB is reality. The diff is the honest error signal — no judgment, just measurement.

Situational Wisdom

When to think fast vs slow in commissioning decisions:

DecisionSpeedBecause
Apply vocabulary mapping (engineering "complete" → L1/L2)FastProtocol is defined — run it
Update dashboard from engineering reportFastEvidence provided — match to framework
Mark any capability L4 (commissioned)SlowRequires independent evidence: usage, receipts, no open issues
Declare a feature brokenSlowBrowser-test before logging — reproduce before reporting
PRD priority reorderFastAlgorithm is deterministic — run it
Commissioning vocabulary disputeSlowBoth teams may be right — check the layer, not the claim

The classic failure: engineering says "complete," dashboard says L0. Both correct — different layers. Measurement consensus precedes direction consensus. The vocabulary is the instrument.

Context

Questions

When the dashboard lags reality, is the problem the instrument or the protocol that updates it?

  • What is the minimum set of instruments the outer loop needs before its commissioning judgments are trustworthy?
  • If the commissioning API existed today, what would the first query reveal that the dashboard currently hides?
  • At what maturity level does the outer loop's validation capability need to be before L4 commissioning means anything?