Testing Enforcement
How do you make the right testing decision structurally impossible to skip?
Documentation describes the trophy model. Agents read it and ignore it. Seven hours and sixteen commits on a single bug fix — every commit touching browser-level symptoms while the server action was never called directly. The testing docs were correct. The enforcement was absent.
Docs are suggestions. Rules, hooks, generators, and plan templates are enforcement. This page specifies every mechanism the engineering repo needs to make layer violations structurally impossible.
Seven Layers
LAYER 7: Skill (debug-test-failure) ← Agent behavior
LAYER 6: Plan Template (investigation) ← Work structure
LAYER 5: Generator (scaffold-action) ← Code creation
LAYER 4: Hook (commit message) ← Commit gate
LAYER 3: Hook (E2E pre-write) ← File gate
LAYER 2: Rule (trophy-layer-gate) ← Decision gate
LAYER 1: Rule (walk-the-pipe) ← Investigation protocol
Each layer catches what the layer below missed. Together they make the failure mode — iterating on E2E symptoms without walking upstream — structurally impossible.
Layer 1: Rule
File: .claude/rules/walk-the-pipe.md
Auto-activates on: Any test failure investigation or E2E spec modification.
Content — the 4-step upstream protocol:
When an E2E test fails, do not touch the spec. Walk upstream. Test each joint in the supply chain, starting from the source.
| Step | Action | Stop condition |
|---|---|---|
| 1 | Call the server action directly | If it returns wrong data, fix the action |
| 2 | Check component hydration and render | If it doesn't render, fix the component |
| 3 | Check form wiring — does submit trigger the action? | If wiring is broken, fix the wiring |
| 4 | Now debug the E2E spec itself | Only after steps 1-3 pass |
Hard constraint: Never modify a spec until the pipe is proven working. If the pipe is broken, no spec fix will help. If the pipe works, the spec fix is obvious.
Cost signal: If you spend more than 30 minutes debugging an E2E test, the test is at the wrong layer. Stop. Walk the pipe.
Layer 2: Rule
File: .claude/rules/trophy-layer-gate.md
Auto-activates on: Creating any new test file.
Content — the selection decision tree:
Before writing a test, identify the layer. If the claim can be proven at a lower layer, writing at a higher layer is blocked.
| I changed... | Test layer | File pattern |
|---|---|---|
| Zod schema, pure function, DTO mapping | L1 Unit | *.schema.spec.ts |
| Server action, repository, adapter | L2 Integration | *.integration.spec.ts |
| Client component with state | L2 Browser | *.browser.test.tsx |
| Multi-step browser flow, auth redirect, layout | L3 E2E | *.spec.ts (Playwright) |
The gate question: Can this test's core claim be proven without a browser?
- YES → Write the cheaper test. L3 is blocked.
- NO → Proceed with E2E. Document why the browser is required.
Decomposition test: If an E2E test proves more than one claim, split it. Each claim gets tested at its cheapest viable layer.
| Combined E2E claim | Decomposed |
|---|---|
| "Form renders AND action creates entity AND redirect works" | L2: action returns valid ID. L3: form visible, button clicks, redirect lands. |
| "Validation rejects bad input AND shows error message" | L1: schema rejects invalid. L3: error message visible after submit. |
| "Auth blocks access AND redirects to login" | L2: action throws without auth. L3: page redirects to login URL. |
Layer 3: Hook
File: .claude/hooks/e2e-pre-write.sh
Triggers: Before any edit to *.spec.ts in an E2E project.
Action:
- Parse the spec file to identify which server actions it imports or references
- For each server action, check if a corresponding
*.integration.spec.tsexists - If no L2 integration test exists for a referenced action:
⚠️ No L2 integration test found for: createVenture
Expected: libs/ventures/server/src/__tests__/create-venture.integration.spec.ts
Write L2 first. E2E tests assume the action works — they test browser wiring only.
Severity: Warning, not block. The agent must acknowledge the warning and either write the L2 test first or justify why E2E-only is correct (browser-dependent claim).
Layer 4: Hook
File: .claude/hooks/test-commit-message.sh
Triggers: Before any commit with fix(e2e) in the message.
Action:
- Check the staged files in the commit
- If the commit ONLY touches E2E spec files (
*.spec.tsin E2E projects) and does NOT touch any integration or unit test file:
⚠️ E2E-only fix detected.
This commit modifies E2E specs without touching L1/L2 tests.
Did you walk the pipe? Common pattern:
- If the server action was broken → fix should be in action code + L2 test
- If the component was broken → fix should be in component code + L2 test
- If only E2E wiring was wrong → this commit is correct, proceed
Add --e2e-only-justified to bypass this check.
Severity: Warning with bypass flag. Tracks how often the bypass is used — high bypass rate signals the hook is miscalibrated or agents are ignoring the protocol.
Layer 5: Generator
Command: nx generate @stackmates/generators:scaffold-action
What it generates:
| File | Layer | Purpose |
|---|---|---|
src/actions/{name}.action.ts | — | The server action (thin wrapper) |
src/actions/{name}.schema.ts | — | Zod input/output schemas |
src/__tests__/{name}.schema.spec.ts | L1 | Schema validation tests |
src/__tests__/{name}.integration.spec.ts | L2 | Server action integration test with real DB |
What it does NOT generate:
- E2E specs. E2E tests are written only after L1 and L2 pass. The generator enforces the cascade by omitting the expensive layer.
L2 test scaffold template:
import { describe, it, expect, vi } from "vitest";
describe("{ActionName} integration", () => {
it("returns valid result with correct input", async () => {
// Arrange: set up test data in real DB
// Act: call the server action directly
// Assert: check return value and DB state
});
it("rejects invalid input via schema", async () => {
// Arrange: prepare invalid input
// Act: call the server action
// Assert: expect validation error, no DB mutation
});
it("rejects unauthorized access", async () => {
// Arrange: no auth context
// Act: call the server action
// Assert: expect auth error
});
});
Layer 6: Plan Template
Template name: test-failure-investigation
Purpose: Structures test failure investigation so the agent cannot skip upstream checks.
Tasks:
| # | Task | Quality gate |
|---|---|---|
| 1 | Identify the failing assertion | Exact assertion text quoted. Layer identified (L1/L2/L3). |
| 2 | Call the server action directly (L2 check) | Action called with same inputs. Return value logged. Pass/fail determined. |
| 3 | Check component hydration (if L2 passes) | Component renders in isolation. Correct props received. |
| 4 | Check form wiring (if component passes) | Submit triggers the action. Form data matches schema. |
| 5 | Modify E2E spec (only if pipe is proven) | Steps 2-4 all pass. Root cause identified at specific layer. Fix targets that layer. |
Hard constraint: Task 5 cannot be started until tasks 2-4 are marked complete or skipped-with-evidence. The plan template enforces the walk-the-pipe protocol through task dependencies.
When to use: Engineering must create a plan from this template before investigating any E2E failure that has persisted beyond the first attempt. One attempt is allowed without the template. Two or more attempts require structured investigation.
Layer 7: Skill
File: .agents/skills/debug-test-failure/SKILL.md
Input: Failing test file path.
Procedure:
| Step | Action | Output |
|---|---|---|
| 1 | Parse the spec to identify which server actions it touches | List of action function names and import paths |
| 2 | Check if L2 integration tests exist for those actions | For each action: L2 exists (path) or MISSING |
| 3 | If MISSING: scaffold L2 tests and run them | L2 test results — pass or fail with error |
| 4 | If L2 passes: investigate component render | Component renders correctly — yes/no |
| 5 | If component passes: investigate E2E spec | Now modify the spec — the pipe is proven |
Quality gate: Agent must show L2 pass evidence before modifying any E2E spec. Evidence means: test output showing the server action returns correct data when called directly. Screenshots of browser behavior do not count as L2 evidence.
Anti-pattern detection: If the agent's commit history shows 3+ commits modifying only E2E spec files without touching L2 tests, the skill flags a layer violation:
🚨 Layer violation detected.
3 commits modified E2E specs without L2 evidence.
Walk the pipe. Call the server action directly.
See: /docs/software/platform/testing-platform/#walk-the-pipe
Process Benchmark
Metric: Commits per bug fix.
Target: 3 or fewer commits per bug fix. A fix that takes 16 commits is a process failure — the debugging was at the wrong layer, and each commit was a symptom-level guess.
Track this as a trailing 30-day average. Spikes above 5 trigger a retrospective asking: "Was the pipe walked before the first spec change?"
See Engineering Quality Benchmarks for the full threshold table.
Dig Deeper
- Testing Platform — Trophy strategy, E2E admission gate, walk-the-pipe protocol
- Testing Strategy — Layer model, selection rules, browser gate decomposition
Context
- Flow Engineering — Enforcement hierarchy and cost of quality
- Engineering Quality Benchmarks — Target thresholds including commits-per-fix
- CI Testing Infrastructure — Merge loop and health loop
Questions
If seven enforcement layers exist, which single layer would have prevented the 16-commit fiasco?
- When the walk-the-pipe rule is a suggestion and the generator enforces L2-first by omission, which enforcement style wins under cognitive load?
- If the commit-message hook tracks bypass frequency, at what bypass rate should the hook be promoted from warning to block?
- When an agent creates a plan from the investigation template but skips task 2 (call the action directly), does the template enforce anything or just document intent?