Skip to main content

Testing Enforcement

How do you make the right testing decision structurally impossible to skip?

Documentation describes the trophy model. Agents read it and ignore it. Seven hours and sixteen commits on a single bug fix — every commit touching browser-level symptoms while the server action was never called directly. The testing docs were correct. The enforcement was absent.

Docs are suggestions. Rules, hooks, generators, and plan templates are enforcement. This page specifies every mechanism the engineering repo needs to make layer violations structurally impossible.

Seven Layers

LAYER 7: Skill (debug-test-failure)     ← Agent behavior
LAYER 6: Plan Template (investigation) ← Work structure
LAYER 5: Generator (scaffold-action) ← Code creation
LAYER 4: Hook (commit message) ← Commit gate
LAYER 3: Hook (E2E pre-write) ← File gate
LAYER 2: Rule (trophy-layer-gate) ← Decision gate
LAYER 1: Rule (walk-the-pipe) ← Investigation protocol

Each layer catches what the layer below missed. Together they make the failure mode — iterating on E2E symptoms without walking upstream — structurally impossible.


Layer 1: Rule

File: .claude/rules/walk-the-pipe.md

Auto-activates on: Any test failure investigation or E2E spec modification.

Content — the 4-step upstream protocol:

When an E2E test fails, do not touch the spec. Walk upstream. Test each joint in the supply chain, starting from the source.

StepActionStop condition
1Call the server action directlyIf it returns wrong data, fix the action
2Check component hydration and renderIf it doesn't render, fix the component
3Check form wiring — does submit trigger the action?If wiring is broken, fix the wiring
4Now debug the E2E spec itselfOnly after steps 1-3 pass

Hard constraint: Never modify a spec until the pipe is proven working. If the pipe is broken, no spec fix will help. If the pipe works, the spec fix is obvious.

Cost signal: If you spend more than 30 minutes debugging an E2E test, the test is at the wrong layer. Stop. Walk the pipe.


Layer 2: Rule

File: .claude/rules/trophy-layer-gate.md

Auto-activates on: Creating any new test file.

Content — the selection decision tree:

Before writing a test, identify the layer. If the claim can be proven at a lower layer, writing at a higher layer is blocked.

I changed...Test layerFile pattern
Zod schema, pure function, DTO mappingL1 Unit*.schema.spec.ts
Server action, repository, adapterL2 Integration*.integration.spec.ts
Client component with stateL2 Browser*.browser.test.tsx
Multi-step browser flow, auth redirect, layoutL3 E2E*.spec.ts (Playwright)

The gate question: Can this test's core claim be proven without a browser?

  • YES → Write the cheaper test. L3 is blocked.
  • NO → Proceed with E2E. Document why the browser is required.

Decomposition test: If an E2E test proves more than one claim, split it. Each claim gets tested at its cheapest viable layer.

Combined E2E claimDecomposed
"Form renders AND action creates entity AND redirect works"L2: action returns valid ID. L3: form visible, button clicks, redirect lands.
"Validation rejects bad input AND shows error message"L1: schema rejects invalid. L3: error message visible after submit.
"Auth blocks access AND redirects to login"L2: action throws without auth. L3: page redirects to login URL.

Layer 3: Hook

File: .claude/hooks/e2e-pre-write.sh

Triggers: Before any edit to *.spec.ts in an E2E project.

Action:

  1. Parse the spec file to identify which server actions it imports or references
  2. For each server action, check if a corresponding *.integration.spec.ts exists
  3. If no L2 integration test exists for a referenced action:
⚠️  No L2 integration test found for: createVenture
Expected: libs/ventures/server/src/__tests__/create-venture.integration.spec.ts
Write L2 first. E2E tests assume the action works — they test browser wiring only.

Severity: Warning, not block. The agent must acknowledge the warning and either write the L2 test first or justify why E2E-only is correct (browser-dependent claim).


Layer 4: Hook

File: .claude/hooks/test-commit-message.sh

Triggers: Before any commit with fix(e2e) in the message.

Action:

  1. Check the staged files in the commit
  2. If the commit ONLY touches E2E spec files (*.spec.ts in E2E projects) and does NOT touch any integration or unit test file:
⚠️  E2E-only fix detected.
This commit modifies E2E specs without touching L1/L2 tests.
Did you walk the pipe? Common pattern:
- If the server action was broken → fix should be in action code + L2 test
- If the component was broken → fix should be in component code + L2 test
- If only E2E wiring was wrong → this commit is correct, proceed

Add --e2e-only-justified to bypass this check.

Severity: Warning with bypass flag. Tracks how often the bypass is used — high bypass rate signals the hook is miscalibrated or agents are ignoring the protocol.


Layer 5: Generator

Command: nx generate @stackmates/generators:scaffold-action

What it generates:

FileLayerPurpose
src/actions/{name}.action.tsThe server action (thin wrapper)
src/actions/{name}.schema.tsZod input/output schemas
src/__tests__/{name}.schema.spec.tsL1Schema validation tests
src/__tests__/{name}.integration.spec.tsL2Server action integration test with real DB

What it does NOT generate:

  • E2E specs. E2E tests are written only after L1 and L2 pass. The generator enforces the cascade by omitting the expensive layer.

L2 test scaffold template:

import { describe, it, expect, vi } from "vitest";

describe("{ActionName} integration", () => {
it("returns valid result with correct input", async () => {
// Arrange: set up test data in real DB
// Act: call the server action directly
// Assert: check return value and DB state
});

it("rejects invalid input via schema", async () => {
// Arrange: prepare invalid input
// Act: call the server action
// Assert: expect validation error, no DB mutation
});

it("rejects unauthorized access", async () => {
// Arrange: no auth context
// Act: call the server action
// Assert: expect auth error
});
});

Layer 6: Plan Template

Template name: test-failure-investigation

Purpose: Structures test failure investigation so the agent cannot skip upstream checks.

Tasks:

#TaskQuality gate
1Identify the failing assertionExact assertion text quoted. Layer identified (L1/L2/L3).
2Call the server action directly (L2 check)Action called with same inputs. Return value logged. Pass/fail determined.
3Check component hydration (if L2 passes)Component renders in isolation. Correct props received.
4Check form wiring (if component passes)Submit triggers the action. Form data matches schema.
5Modify E2E spec (only if pipe is proven)Steps 2-4 all pass. Root cause identified at specific layer. Fix targets that layer.

Hard constraint: Task 5 cannot be started until tasks 2-4 are marked complete or skipped-with-evidence. The plan template enforces the walk-the-pipe protocol through task dependencies.

When to use: Engineering must create a plan from this template before investigating any E2E failure that has persisted beyond the first attempt. One attempt is allowed without the template. Two or more attempts require structured investigation.


Layer 7: Skill

File: .agents/skills/debug-test-failure/SKILL.md

Input: Failing test file path.

Procedure:

StepActionOutput
1Parse the spec to identify which server actions it touchesList of action function names and import paths
2Check if L2 integration tests exist for those actionsFor each action: L2 exists (path) or MISSING
3If MISSING: scaffold L2 tests and run themL2 test results — pass or fail with error
4If L2 passes: investigate component renderComponent renders correctly — yes/no
5If component passes: investigate E2E specNow modify the spec — the pipe is proven

Quality gate: Agent must show L2 pass evidence before modifying any E2E spec. Evidence means: test output showing the server action returns correct data when called directly. Screenshots of browser behavior do not count as L2 evidence.

Anti-pattern detection: If the agent's commit history shows 3+ commits modifying only E2E spec files without touching L2 tests, the skill flags a layer violation:

🚨 Layer violation detected.
3 commits modified E2E specs without L2 evidence.
Walk the pipe. Call the server action directly.
See: /docs/software/platform/testing-platform/#walk-the-pipe

Process Benchmark

Metric: Commits per bug fix.

Target: 3 or fewer commits per bug fix. A fix that takes 16 commits is a process failure — the debugging was at the wrong layer, and each commit was a symptom-level guess.

Track this as a trailing 30-day average. Spikes above 5 trigger a retrospective asking: "Was the pipe walked before the first spec change?"

See Engineering Quality Benchmarks for the full threshold table.

Dig Deeper

  • Testing Platform — Trophy strategy, E2E admission gate, walk-the-pipe protocol
  • Testing Strategy — Layer model, selection rules, browser gate decomposition

Context

Questions

If seven enforcement layers exist, which single layer would have prevented the 16-commit fiasco?

  • When the walk-the-pipe rule is a suggestion and the generator enforces L2-first by omission, which enforcement style wins under cognitive load?
  • If the commit-message hook tracks bypass frequency, at what bypass rate should the hook be promoted from warning to block?
  • When an agent creates a plan from the investigation template but skips task 2 (call the action directly), does the template enforce anything or just document intent?