Testing Enforcement

How do you make the right testing decision structurally impossible to skip?

Documentation describes the trophy model. Agents read it and ignore it. Seven hours and sixteen commits on a single bug fix — every commit touching browser-level symptoms while the server action was never called directly. The testing docs were correct. The enforcement was absent.

Docs are suggestions. Rules, hooks, generators, and plan templates are enforcement. This page specifies every mechanism the engineering repo needs to make layer violations structurally impossible.

Seven Layers

LAYER 7: Skill (debug-test-failure)     ← Agent behavior
LAYER 6: Plan Template (investigation)  ← Work structure
LAYER 5: Generator (scaffold-action)    ← Code creation
LAYER 4: Hook (commit message)          ← Commit gate
LAYER 3: Hook (E2E pre-write)           ← File gate
LAYER 2: Rule (trophy-layer-gate)       ← Decision gate
LAYER 1: Rule (walk-the-pipe)           ← Investigation protocol

Each layer catches what the layer below missed. Together they make the failure mode — iterating on E2E symptoms without walking upstream — structurally impossible.

Layer 1: Rule

File: .claude/rules/walk-the-pipe.md

Auto-activates on: Any test failure investigation or E2E spec modification.

Content — the 4-step upstream protocol:

When an E2E test fails, do not touch the spec. Walk upstream. Test each joint in the supply chain, starting from the source.

Step	Action	Stop condition
1	Call the server action directly	If it returns wrong data, fix the action
2	Check component hydration and render	If it doesn't render, fix the component
3	Check form wiring — does submit trigger the action?	If wiring is broken, fix the wiring
4	Now debug the E2E spec itself	Only after steps 1-3 pass

Hard constraint: Never modify a spec until the pipe is proven working. If the pipe is broken, no spec fix will help. If the pipe works, the spec fix is obvious.

Cost signal: If you spend more than 30 minutes debugging an E2E test, the test is at the wrong layer. Stop. Walk the pipe.

Layer 2: Rule

File: .claude/rules/trophy-layer-gate.md

Auto-activates on: Creating any new test file.

Content — the selection decision tree:

Before writing a test, identify the layer. If the claim can be proven at a lower layer, writing at a higher layer is blocked.

I changed...	Test layer	File pattern
Zod schema, pure function, DTO mapping	L1 Unit	`*.schema.spec.ts`
Server action, repository, adapter	L2 Integration	`*.integration.spec.ts`
Client component with state	L2 Browser	`*.browser.test.tsx`
Multi-step browser flow, auth redirect, layout	L3 E2E	`*.spec.ts` (Playwright)

The gate question: Can this test's core claim be proven without a browser?

YES → Write the cheaper test. L3 is blocked.
NO → Proceed with E2E. Document why the browser is required.

Decomposition test: If an E2E test proves more than one claim, split it. Each claim gets tested at its cheapest viable layer.

Combined E2E claim	Decomposed
"Form renders AND action creates entity AND redirect works"	L2: action returns valid ID. L3: form visible, button clicks, redirect lands.
"Validation rejects bad input AND shows error message"	L1: schema rejects invalid. L3: error message visible after submit.
"Auth blocks access AND redirects to login"	L2: action throws without auth. L3: page redirects to login URL.

Layer 3: Hook

File: .claude/hooks/e2e-pre-write.sh

Triggers: Before any edit to *.spec.ts in an E2E project.

Action:

Parse the spec file to identify which server actions it imports or references
For each server action, check if a corresponding *.integration.spec.ts exists
If no L2 integration test exists for a referenced action:

⚠️  No L2 integration test found for: createVenture
    Expected: libs/ventures/server/src/__tests__/create-venture.integration.spec.ts
    Write L2 first. E2E tests assume the action works — they test browser wiring only.

Severity: Warning, not block. The agent must acknowledge the warning and either write the L2 test first or justify why E2E-only is correct (browser-dependent claim).

Layer 4: Hook

File: .claude/hooks/test-commit-message.sh

Triggers: Before any commit with fix(e2e) in the message.

Action:

Check the staged files in the commit
If the commit ONLY touches E2E spec files (*.spec.ts in E2E projects) and does NOT touch any integration or unit test file:

⚠️  E2E-only fix detected.
    This commit modifies E2E specs without touching L1/L2 tests.
    Did you walk the pipe? Common pattern:
    - If the server action was broken → fix should be in action code + L2 test
    - If the component was broken → fix should be in component code + L2 test
    - If only E2E wiring was wrong → this commit is correct, proceed

    Add --e2e-only-justified to bypass this check.

Severity: Warning with bypass flag. Tracks how often the bypass is used — high bypass rate signals the hook is miscalibrated or agents are ignoring the protocol.

Layer 5: Generator

Command: nx generate @stackmates/generators:scaffold-action

What it generates:

File	Layer	Purpose
`src/actions/{name}.action.ts`	—	The server action (thin wrapper)
`src/actions/{name}.schema.ts`	—	Zod input/output schemas
`src/__tests__/{name}.schema.spec.ts`	L1	Schema validation tests
`src/__tests__/{name}.integration.spec.ts`	L2	Server action integration test with real DB

What it does NOT generate:

E2E specs. E2E tests are written only after L1 and L2 pass. The generator enforces the cascade by omitting the expensive layer.

L2 test scaffold template:

import { describe, it, expect, vi } from "vitest";

describe("{ActionName} integration", () => {
  it("returns valid result with correct input", async () => {
    // Arrange: set up test data in real DB
    // Act: call the server action directly
    // Assert: check return value and DB state
  });

  it("rejects invalid input via schema", async () => {
    // Arrange: prepare invalid input
    // Act: call the server action
    // Assert: expect validation error, no DB mutation
  });

  it("rejects unauthorized access", async () => {
    // Arrange: no auth context
    // Act: call the server action
    // Assert: expect auth error
  });
});

Layer 6: Plan Template

Template name: test-failure-investigation

Purpose: Structures test failure investigation so the agent cannot skip upstream checks.

Tasks:

#	Task	Quality gate
1	Identify the failing assertion	Exact assertion text quoted. Layer identified (L1/L2/L3).
2	Call the server action directly (L2 check)	Action called with same inputs. Return value logged. Pass/fail determined.
3	Check component hydration (if L2 passes)	Component renders in isolation. Correct props received.
4	Check form wiring (if component passes)	Submit triggers the action. Form data matches schema.
5	Modify E2E spec (only if pipe is proven)	Steps 2-4 all pass. Root cause identified at specific layer. Fix targets that layer.

Hard constraint: Task 5 cannot be started until tasks 2-4 are marked complete or skipped-with-evidence. The plan template enforces the walk-the-pipe protocol through task dependencies.

When to use: Engineering must create a plan from this template before investigating any E2E failure that has persisted beyond the first attempt. One attempt is allowed without the template. Two or more attempts require structured investigation.

Layer 7: Skill

File: .agents/skills/debug-test-failure/SKILL.md

Input: Failing test file path.

Procedure:

Step	Action	Output
1	Parse the spec to identify which server actions it touches	List of action function names and import paths
2	Check if L2 integration tests exist for those actions	For each action: L2 exists (path) or MISSING
3	If MISSING: scaffold L2 tests and run them	L2 test results — pass or fail with error
4	If L2 passes: investigate component render	Component renders correctly — yes/no
5	If component passes: investigate E2E spec	Now modify the spec — the pipe is proven

Quality gate: Agent must show L2 pass evidence before modifying any E2E spec. Evidence means: test output showing the server action returns correct data when called directly. Screenshots of browser behavior do not count as L2 evidence.

Anti-pattern detection: If the agent's commit history shows 3+ commits modifying only E2E spec files without touching L2 tests, the skill flags a layer violation:

🚨 Layer violation detected.
   3 commits modified E2E specs without L2 evidence.
   Walk the pipe. Call the server action directly.
   See: /docs/software/platform/testing-platform/#walk-the-pipe

Process Benchmark

Metric: Commits per bug fix.

Target: 3 or fewer commits per bug fix. A fix that takes 16 commits is a process failure — the debugging was at the wrong layer, and each commit was a symptom-level guess.

Track this as a trailing 30-day average. Spikes above 5 trigger a retrospective asking: "Was the pipe walked before the first spec change?"

See Engineering Quality Benchmarks for the full threshold table.

Dig Deeper

Testing Platform — Trophy strategy, E2E admission gate, walk-the-pipe protocol
Testing Strategy — Layer model, selection rules, browser gate decomposition

Context

Flow Engineering — Enforcement hierarchy and cost of quality
Engineering Quality Benchmarks — Target thresholds including commits-per-fix
CI Testing Infrastructure — Merge loop and health loop

Questions

If seven enforcement layers exist, which single layer would have prevented the 16-commit fiasco?

When the walk-the-pipe rule is a suggestion and the generator enforces L2-first by omission, which enforcement style wins under cognitive load?
If the commit-message hook tracks bypass frequency, at what bypass rate should the hook be promoted from warning to block?
When an agent creates a plan from the investigation template but skips task 2 (call the action directly), does the template enforce anything or just document intent?

Seven Layers​

Layer 1: Rule​

Layer 2: Rule​

Layer 3: Hook​

Layer 4: Hook​

Layer 5: Generator​

Layer 6: Plan Template​

Layer 7: Skill​

Process Benchmark​

Dig Deeper​

Context​

Questions​