Outer Loop Validation
How does the outer loop validate reality without touching the code?
The outer loop agent's job is not to build — it's to hold calibration between desired state (this repo) and actual state (engineering). That requires capability: instruments to read reality, protocols to interpret it, and enough situational wisdom to know when to trust the dashboard and when to check.
The Two Loops
DREAM REPO (desired state — this repo)
↕ error signal = the gap
VALIDATION LAYER (MCPs + commissioning API)
↕ reads actual state
ENGINEERING (builds, ships, operates)
The separation is the design. The builder cannot validate their own work — not because of distrust, but because proximity creates bias. The outer loop exists to close the VVFL: the inner loop builds, the outer loop reads truth, the error signal tells both where to go next.
Capability Map
CAPABILITY MATURITY CATEGORY GAP?
────────────────────────────────────── ──────── ──────── ─────
Read deployed state (Vercel/GitHub) ██░░░ (2) Supporting
Browser-test features (Chrome MCP) ██░░░ (2) Core
Update commissioning dashboard ██░░░ (2) Core
Signal engineering (agent-comms) ██░░░ (2) Supporting
PRD spec + scoring quality ███░░ (3) Core
Read DB schema + data (Supabase MCP) ░░░░░ (0) Core CRITICAL
Read error reality (Sentry MCP) ░░░░░ (0) Core GAP
Read usage reality (PostHog MCP) ░░░░░ (0) Core GAP
Cross-repo commissioning protocol █░░░░ (1) Core GAP
Commissioning API (machine-readable) ░░░░░ (0) Core CRITICAL
Capability = Resources (MCPs/APIs) + Know-How (commissioning vocabulary) + Character (bias separation)
The critical gaps are not knowledge gaps — the commissioning protocol is well-defined. They are instrument gaps: without Supabase, Sentry, and PostHog MCPs, the outer loop reads the dashboard but cannot verify it.
KB → Situational Wisdom Upgrade Path
Knowledge exists. Wisdom requires engineering it into systems that fire automatically.
| Stage | What exists | What's missing | Priority |
|---|---|---|---|
| Knowledge | Commissioning L0-L4 protocol, PRD standards, vocabulary | — | — |
| Mantra | "The builder never validates their own work" | Embedded in commissioning doc | — |
| Rule | prd-maintenance.md — check after every PRD change | Relies on memory, not trigger | — |
| Hook | docs-post-edit.sh — catches format, not reality | No reality check on dashboard edits | Medium |
| System | Commissioning API returning machine-readable state | NOT BUILT | High |
The mantra page names the cascade: Mantra → Rule → Hook → System. Flow state is where none of it is conscious. The outer loop reaches flow when the commissioning API makes the gap visible automatically — no manual cross-referencing, no dashboard lag.
Instruments to Add
Already wired in this session:
| MCP | Reads | Validation signal |
|---|---|---|
mcp__claude_ai_Vercel__* | Deployments, build logs, runtime errors | Is the app live? What's crashing? |
mcp__github__* | Commits, PRs, code presence | Does code exist for what's specced? |
mcp__claude-in-chrome__* | Live browser session | Does the feature work end-to-end? |
Install next (priority order):
| MCP | What it reads | Commissioning signal |
|---|---|---|
| Supabase | DB schema + data | L1 evidence: schema exists. L2 evidence: records populated |
| Sentry | Production errors | Reality check: what's actually failing right now |
| PostHog | Feature usage | L4 evidence: humans are using it, value is flowing |
Install: add to ~/.claude/mcp_settings.json:
"supabase": { "command": "npx", "args": ["@supabase/mcp-server-supabase@latest", "--project-ref", "YOUR_REF"] },
"sentry": { "command": "npx", "args": ["-y", "@sentry/mcp-server@latest"] },
"posthog": { "command": "npx", "args": ["@posthog/mcp@latest"] }
Commission from Engineering
One table in Supabase, one endpoint. Both the outer loop agent and engineering read the same source of truth.
Write Rule
All writes go through Drizzle ORM + repository pattern. No exceptions.
The outer loop agent reads only — via Supabase MCP (direct DB query) or the REST endpoint. Engineering writes via the repo. This is not a trust constraint — it's a quality constraint. Every write through the repo is type-safe, auditable, and consistent. Writing directly via Supabase MCP would bypass every compound-quality guarantee the engineering team has built.
OUTER LOOP (this agent) ENGINEERING TEAM
│ │
READ only READ + WRITE
│ │
Supabase MCP ─────────────── Drizzle ORM + Repo
REST endpoint │
└───────────────────────── Supabase DB
(source of truth)
Schema to Build
Engineering commissions this table:
// Drizzle schema — commissioning_capabilities
{
id: uuid (PK)
slug: text (unique) // kebab-case, matches feature-matrix.md row
feature_name: text
capability_enabled: text
prd_slug: text // links back to PRD
claimed_level: enum // L0 | L1 | L2 | L3 | L4
evidence: jsonb // { receipts: [], test_results: [], usage: {} }
last_verified: timestamp
verified_by: text // agent name or human
updated_at: timestamp
}
Endpoints to Expose
GET /api/commissioning/status
→ All capabilities: slug, claimed_level, evidence, last_verified
GET /api/commissioning/{slug}/evidence
→ One capability: receipts, test results, usage data
GET /api/health/features
→ Liveness check: which features respond right now
GET /api/vvfl/receipts?since=7d
→ Agent receipts: work done + artifacts + gate results
A2A Path
These REST endpoints are the HTTP stub before the A2A protocol is wired. When A2A is live, commissioning/status becomes a first-class agent task — any agent can query what needs commissioning and post evidence through the same protocol as every other inter-agent workflow. The schema and repo pattern built now carry forward unchanged; only the transport layer upgrades.
The dream repo's feature-matrix.md is the spec. The DB is reality. The diff is the honest error signal — no judgment, just measurement.
Situational Wisdom
When to think fast vs slow in commissioning decisions:
| Decision | Speed | Because |
|---|---|---|
| Apply vocabulary mapping (engineering "complete" → L1/L2) | Fast | Protocol is defined — run it |
| Update dashboard from engineering report | Fast | Evidence provided — match to framework |
| Mark any capability L4 (commissioned) | Slow | Requires independent evidence: usage, receipts, no open issues |
| Declare a feature broken | Slow | Browser-test before logging — reproduce before reporting |
| PRD priority reorder | Fast | Algorithm is deterministic — run it |
| Commissioning vocabulary dispute | Slow | Both teams may be right — check the layer, not the claim |
The classic failure: engineering says "complete," dashboard says L0. Both correct — different layers. Measurement consensus precedes direction consensus. The vocabulary is the instrument.
Return Signal
When the outer loop detects a gap between spec and reality, engineering returns a signal via the Return Signals protocol. The outer loop measures. The inner loop corrects. The signal channel couples them. Dream Team owns the spec and the commissioning. Engineering owns the architecture and the build.
Context
- Return Signals — The handoff contract between Dream Team and Engineering
- Flow Engineering — The five maps that precede this validation loop
- Commissioning Protocol — L0-L4 maturity model
- Commissioning State Machine PRD — Schema spec for
commissioning_capabilitiestable + write rules - Commissioning Dashboard — The spec this loop validates against
- Situational Wisdom — State of mind × state of play = wisdom
- Capabilities — What enables doing, not just knowing
- Control System — The PID model: setpoint, measurement, error, correction
- Mantra — The cascade: Mantra → Rule → Hook → System → Flow
Questions
When the dashboard lags reality, is the problem the instrument or the protocol that updates it?
- What is the minimum set of instruments the outer loop needs before its commissioning judgments are trustworthy?
- If the commissioning API existed today, what would the first query reveal that the dashboard currently hides?
- At what maturity level does the outer loop's validation capability need to be before L4 commissioning means anything?