Outer Loop Validation

How does the outer loop validate reality without touching the code?

The outer loop agent's job is not to build — it's to hold calibration between desired state (this repo) and actual state (engineering). That requires capability: instruments to read reality, protocols to interpret it, and enough situational wisdom to know when to trust the dashboard and when to check.

The Two Loops

DREAM REPO (desired state — this repo)
     ↕  error signal = the gap
VALIDATION LAYER (MCPs + commissioning API)
     ↕  reads actual state
ENGINEERING (builds, ships, operates)

The separation is the design. The builder cannot validate their own work — not because of distrust, but because proximity creates bias. The outer loop exists to close the VVFL: the inner loop builds, the outer loop reads truth, the error signal tells both where to go next.

Capability Map

CAPABILITY                              MATURITY     CATEGORY    GAP?
──────────────────────────────────────  ────────     ────────    ─────
Read deployed state (Vercel/GitHub)     ██░░░ (2)    Supporting
Browser-test features (Chrome MCP)      ██░░░ (2)    Core
Update commissioning dashboard          ██░░░ (2)    Core
Signal engineering (agent-comms)        ██░░░ (2)    Supporting
PRD spec + scoring quality              ███░░ (3)    Core
Read DB schema + data (Supabase MCP)    ░░░░░ (0)    Core        CRITICAL
Read error reality (Sentry MCP)         ░░░░░ (0)    Core        GAP
Read usage reality (PostHog MCP)        ░░░░░ (0)    Core        GAP
Cross-repo commissioning protocol       █░░░░ (1)    Core        GAP
Commissioning API (machine-readable)    ░░░░░ (0)    Core        CRITICAL

Capability = Resources (MCPs/APIs) + Know-How (commissioning vocabulary) + Character (bias separation)

The critical gaps are not knowledge gaps — the commissioning protocol is well-defined. They are instrument gaps: without Supabase, Sentry, and PostHog MCPs, the outer loop reads the dashboard but cannot verify it.

KB → Situational Wisdom Upgrade Path

Knowledge exists. Wisdom requires engineering it into systems that fire automatically.

Stage	What exists	What's missing	Priority
Knowledge	Commissioning L0-L4 protocol, PRD standards, vocabulary	—	—
Mantra	"The builder never validates their own work"	Embedded in commissioning doc	—
Rule	`prd-maintenance.md` — check after every PRD change	Relies on memory, not trigger	—
Hook	`docs-post-edit.sh` — catches format, not reality	No reality check on dashboard edits	Medium
System	Commissioning API returning machine-readable state	NOT BUILT	High

The mantra page names the cascade: Mantra → Rule → Hook → System. Flow state is where none of it is conscious. The outer loop reaches flow when the commissioning API makes the gap visible automatically — no manual cross-referencing, no dashboard lag.

Instruments to Add

Already wired in this session:

MCP	Reads	Validation signal
`mcp__claude_ai_Vercel__*`	Deployments, build logs, runtime errors	Is the app live? What's crashing?
`mcp__github__*`	Commits, PRs, code presence	Does code exist for what's specced?
`mcp__claude-in-chrome__*`	Live browser session	Does the feature work end-to-end?

Install next (priority order):

MCP	What it reads	Commissioning signal
Supabase	DB schema + data	L1 evidence: schema exists. L2 evidence: records populated
Sentry	Production errors	Reality check: what's actually failing right now
PostHog	Feature usage	L4 evidence: humans are using it, value is flowing

Install: add to ~/.claude/mcp_settings.json:

"supabase": { "command": "npx", "args": ["@supabase/mcp-server-supabase@latest", "--project-ref", "YOUR_REF"] },
"sentry": { "command": "npx", "args": ["-y", "@sentry/mcp-server@latest"] },
"posthog": { "command": "npx", "args": ["@posthog/mcp@latest"] }

Commission from Engineering

One table in Supabase, one endpoint. Both the outer loop agent and engineering read the same source of truth.

Write Rule

All writes go through Drizzle ORM + repository pattern. No exceptions.

The outer loop agent reads only — via Supabase MCP (direct DB query) or the REST endpoint. Engineering writes via the repo. This is not a trust constraint — it's a quality constraint. Every write through the repo is type-safe, auditable, and consistent. Writing directly via Supabase MCP would bypass every compound-quality guarantee the engineering team has built.

OUTER LOOP (this agent)          ENGINEERING TEAM
        │                               │
   READ only                       READ + WRITE
        │                               │
   Supabase MCP ─────────────── Drizzle ORM + Repo
   REST endpoint                        │
        └───────────────────────── Supabase DB
                                  (source of truth)

Schema to Build

Engineering commissions this table:

// Drizzle schema — commissioning_capabilities
{
  id: uuid (PK)
  slug: text (unique) // kebab-case, matches commisioning-status.md row
  feature_name: text
  capability_enabled: text
  prd_slug: text        // links back to PRD
  claimed_level: enum   // L0 | L1 | L2 | L3 | L4
  evidence: jsonb       // { receipts: [], test_results: [], usage: {} }
  last_verified: timestamp
  verified_by: text     // agent name or human
  updated_at: timestamp
}

Endpoints to Expose

GET /api/commissioning/status
→ All capabilities: slug, claimed_level, evidence, last_verified

GET /api/commissioning/{slug}/evidence
→ One capability: receipts, test results, usage data

GET /api/health/features
→ Liveness check: which features respond right now

GET /api/vvfl/receipts?since=7d
→ Agent receipts: work done + artifacts + gate results

A2A Path

These REST endpoints are the HTTP stub before the A2A protocol is wired. When A2A is live, commissioning/status becomes a first-class agent task — any agent can query what needs commissioning and post evidence through the same protocol as every other inter-agent workflow. The schema and repo pattern built now carry forward unchanged; only the transport layer upgrades.

The dream repo's commisioning-status.md is the spec. The DB is reality. The diff is the honest error signal — no judgment, just measurement.

Situational Wisdom

When to think fast vs slow in commissioning decisions:

Decision	Speed	Because
Apply vocabulary mapping (engineering "complete" → L1/L2)	Fast	Protocol is defined — run it
Update dashboard from engineering report	Fast	Evidence provided — match to framework
Mark any capability L4 (commissioned)	Slow	Requires independent evidence: usage, receipts, no open issues
Declare a feature broken	Slow	Browser-test before logging — reproduce before reporting
PRD priority reorder	Fast	Algorithm is deterministic — run it
Commissioning vocabulary dispute	Slow	Both teams may be right — check the layer, not the claim

The classic failure: engineering says "complete," dashboard says L0. Both correct — different layers. Measurement consensus precedes direction consensus. The vocabulary is the instrument.

Context

Flow Engineering — The five maps that precede this validation loop
Commissioning Protocol — L0-L4 maturity model
Commissioning State Machine PRD — Schema spec for commissioning_capabilities table + write rules
Commissioning Dashboard — The spec this loop validates against
Situational Wisdom — State of mind × state of play = wisdom
Capabilities — What enables doing, not just knowing
Control System — The PID model: setpoint, measurement, error, correction
Mantra — The cascade: Mantra → Rule → Hook → System → Flow

Questions

When the dashboard lags reality, is the problem the instrument or the protocol that updates it?

What is the minimum set of instruments the outer loop needs before its commissioning judgments are trustworthy?
If the commissioning API existed today, what would the first query reveal that the dashboard currently hides?
At what maturity level does the outer loop's validation capability need to be before L4 commissioning means anything?

The Two Loops​

Capability Map​

KB → Situational Wisdom Upgrade Path​

Instruments to Add​

Commission from Engineering​

Write Rule​

Schema to Build​

Endpoints to Expose​

A2A Path​

Situational Wisdom​

Context​

Questions​