Outer Loop Validation
How does the outer loop validate reality without touching the code?
The outer loop agent's job is not to build — it's to hold calibration between desired state (this repo) and actual state (engineering). That requires capability: instruments to read reality, protocols to interpret it, and enough situational wisdom to know when to trust the dashboard and when to check.
The Two Loops
DREAM REPO (desired state — this repo)
↕ error signal = the gap
VALIDATION LAYER (MCPs + commissioning API)
↕ reads actual state
ENGINEERING (builds, ships, operates)
The separation is the design. The builder cannot validate their own work — not because of distrust, but because proximity creates bias. The outer loop exists to close the VVFL: the inner loop builds, the outer loop reads truth, the error signal tells both where to go next.
Capability Map
CAPABILITY MATURITY CATEGORY GAP?
────────────────────────────────────── ──────── ──────── ─────
Read deployed state (Vercel/GitHub) ██░░░ (2) Supporting
Browser-test features (Chrome MCP) ██░░░ (2) Core
Update commissioning dashboard ██░░░ (2) Core
Signal engineering (agent-comms) ██░░░ (2) Supporting
PRD spec + scoring quality ███░░ (3) Core
Read DB schema + data (Supabase MCP) ░░░░░ (0) Core CRITICAL
Read error reality (Sentry MCP) ░░░░░ (0) Core GAP
Read usage reality (PostHog MCP) ░░░░░ (0) Core GAP
Cross-repo commissioning protocol █░░░░ (1) Core GAP
Commissioning API (machine-readable) ░░░░░ (0) Core CRITICAL
Capability = Resources (MCPs/APIs) + Know-How (commissioning vocabulary) + Character (bias separation)
The critical gaps are not knowledge gaps — the commissioning protocol is well-defined. They are instrument gaps: without Supabase, Sentry, and PostHog MCPs, the outer loop reads the dashboard but cannot verify it.
KB → Situational Wisdom Upgrade Path
Knowledge exists. Wisdom requires engineering it into systems that fire automatically.
| Stage | What exists | What's missing | Priority |
|---|---|---|---|
| Knowledge | Commissioning L0-L4 protocol, PRD standards, vocabulary | — | — |
| Mantra | "The builder never validates their own work" | Embedded in commissioning doc | — |
| Rule | prd-maintenance.md — check after every PRD change | Relies on memory, not trigger | — |
| Hook | docs-post-edit.sh — catches format, not reality | No reality check on dashboard edits | Medium |
| System | Commissioning API returning machine-readable state | NOT BUILT | High |
The mantra page names the cascade: Mantra → Rule → Hook → System. Flow state is where none of it is conscious. The outer loop reaches flow when the commissioning API makes the gap visible automatically — no manual cross-referencing, no dashboard lag.
Instruments to Add
Already wired in this session:
| MCP | Reads | Validation signal |
|---|---|---|
mcp__claude_ai_Vercel__* | Deployments, build logs, runtime errors | Is the app live? What's crashing? |
mcp__github__* | Commits, PRs, code presence | Does code exist for what's specced? |
mcp__claude-in-chrome__* | Live browser session | Does the feature work end-to-end? |
Install next (priority order):
| MCP | What it reads | Commissioning signal |
|---|---|---|
| Supabase | DB schema + data | L1 evidence: schema exists. L2 evidence: records populated |
| Sentry | Production errors | Reality check: what's actually failing right now |
| PostHog | Feature usage | L4 evidence: humans are using it, value is flowing |
Install: add to ~/.claude/mcp_settings.json:
"supabase": { "command": "npx", "args": ["@supabase/mcp-server-supabase@latest", "--project-ref", "YOUR_REF"] },
"sentry": { "command": "npx", "args": ["-y", "@sentry/mcp-server@latest"] },
"posthog": { "command": "npx", "args": ["@posthog/mcp@latest"] }
Commission from Engineering
One table in Supabase, one endpoint. Both the outer loop agent and engineering read the same source of truth.
Write Rule
All writes go through Drizzle ORM + repository pattern. No exceptions.
The outer loop agent reads only — via Supabase MCP (direct DB query) or the REST endpoint. Engineering writes via the repo. This is not a trust constraint — it's a quality constraint. Every write through the repo is type-safe, auditable, and consistent. Writing directly via Supabase MCP would bypass every compound-quality guarantee the engineering team has built.
OUTER LOOP (this agent) ENGINEERING TEAM
│ │
READ only READ + WRITE
│ │
Supabase MCP ─────────────── Drizzle ORM + Repo
REST endpoint │
└───────────────────────── Supabase DB
(source of truth)
Schema to Build
Engineering commissions this table:
// Drizzle schema — commissioning_capabilities
{
id: uuid (PK)
slug: text (unique) // kebab-case, matches commisioning-status.md row
feature_name: text
capability_enabled: text
prd_slug: text // links back to PRD
claimed_level: enum // L0 | L1 | L2 | L3 | L4
evidence: jsonb // { receipts: [], test_results: [], usage: {} }
last_verified: timestamp
verified_by: text // agent name or human
updated_at: timestamp
}
Endpoints to Expose
GET /api/commissioning/status
→ All capabilities: slug, claimed_level, evidence, last_verified
GET /api/commissioning/{slug}/evidence
→ One capability: receipts, test results, usage data
GET /api/health/features
→ Liveness check: which features respond right now
GET /api/vvfl/receipts?since=7d
→ Agent receipts: work done + artifacts + gate results
A2A Path
These REST endpoints are the HTTP stub before the A2A protocol is wired. When A2A is live, commissioning/status becomes a first-class agent task — any agent can query what needs commissioning and post evidence through the same protocol as every other inter-agent workflow. The schema and repo pattern built now carry forward unchanged; only the transport layer upgrades.
The dream repo's commisioning-status.md is the spec. The DB is reality. The diff is the honest error signal — no judgment, just measurement.
Situational Wisdom
When to think fast vs slow in commissioning decisions:
| Decision | Speed | Because |
|---|---|---|
| Apply vocabulary mapping (engineering "complete" → L1/L2) | Fast | Protocol is defined — run it |
| Update dashboard from engineering report | Fast | Evidence provided — match to framework |
| Mark any capability L4 (commissioned) | Slow | Requires independent evidence: usage, receipts, no open issues |
| Declare a feature broken | Slow | Browser-test before logging — reproduce before reporting |
| PRD priority reorder | Fast | Algorithm is deterministic — run it |
| Commissioning vocabulary dispute | Slow | Both teams may be right — check the layer, not the claim |
The classic failure: engineering says "complete," dashboard says L0. Both correct — different layers. Measurement consensus precedes direction consensus. The vocabulary is the instrument.
Context
- Flow Engineering — The five maps that precede this validation loop
- Commissioning Protocol — L0-L4 maturity model
- Commissioning State Machine PRD — Schema spec for
commissioning_capabilitiestable + write rules - Commissioning Dashboard — The spec this loop validates against
- Situational Wisdom — State of mind × state of play = wisdom
- Capabilities — What enables doing, not just knowing
- Control System — The PID model: setpoint, measurement, error, correction
- Mantra — The cascade: Mantra → Rule → Hook → System → Flow
Questions
When the dashboard lags reality, is the problem the instrument or the protocol that updates it?
- What is the minimum set of instruments the outer loop needs before its commissioning judgments are trustworthy?
- If the commissioning API existed today, what would the first query reveal that the dashboard currently hides?
- At what maturity level does the outer loop's validation capability need to be before L4 commissioning means anything?