Testing Platform

What is the cheapest test that gives sufficient confidence?

That question — Kent C. Dodds' core insight — drives the entire strategy. Not "how many tests" or "what percentage coverage." The question is economic: where does each dollar of testing effort buy the most confidence?

The Trophy

The Testing Trophy replaces the Testing Pyramid. The pyramid assumed UI was expensive and unit tests were king. Server-side rendering inverted that assumption. When most logic lives in server actions and services, integration tests deliver the highest confidence per dollar.

        ╔═══════════════╗
        ║    E2E (L3)   ║  ← Browser-dependent flows only
        ╠═══════════════╣
        ║               ║
        ║ Integration   ║  ← The bulk: server actions, services, repos
        ║    (L2)       ║
        ║               ║
        ╠═══════════════╣
        ║  Unit (L1)    ║  ← Pure transforms, Zod schemas
        ╠═══════════════╣
        ║ Static (L0)   ║  ← TypeScript compiler — free
        ╚═══════════════╝

For SSR-heavy apps (Next.js with Server Components), the trophy grows top-heavy. More logic is server-side, so L2 integration tests cover more surface area. E2E remains expensive — use it only when the browser itself is part of the proof.

Confidence Coefficient

Each layer answers a different question. Stop at the first layer that covers the change.

Layer	Question	Tool	Speed	Confidence
L0 Static	Does it compile?	`tsc --noEmit`	<1s	Contracts match
L1 Unit	Does the transform work?	Vitest	<1s/test	Logic correct
L2 Integration	Does the data flow work?	Vitest + real DB	5-30s/test	Wiring correct
L3 E2E	Can a human complete the journey?	Playwright	30-120s/test	Experience correct

L0 runs on every change. The compiler is free verification. A broken refactor lights up every consumer instantly.

Platform Constraints

Next.js with React Server Components changes what's testable at each layer.

Code Type	Testable At	Why
Zod schemas, DTOs, pure functions	L1 Unit	Zero dependencies
Server actions (`"use server"`)	L2 Integration	Mock `next/headers`, `next/cache` — test the function directly
Repository/adapter layer	L2 Integration	Real database, real queries
API routes (`route.ts`)	L2 Integration	NTARH for real Next.js routing
Client components with state	L2 Browser Mode	Vitest Browser Mode + MSW
Async Server Components	L3 E2E only	Cannot render outside Next.js server
Multi-step browser flows	L3 E2E only	Browser IS the proof

E2E Admission Gate

Before writing any E2E test, the claim must pass this gate. If it fails, write a cheaper test.

The Gate

Question	Yes → E2E	No → lower layer
Does this test require a real browser to prove its claim?	Auth redirects, layout, a11y, keyboard nav	Server action results, validation logic, data transforms
Is the browser the only environment where this behavior exists?	Client-side hydration state, CSS rendering	Server action return values, DB writes, redirect URLs
Has the underlying logic already been proven at L2?	E2E confirms wiring only	Write L2 first, then decide if E2E adds signal

E2E tests that fail this gate are integration tests running in an expensive container. They carry the full weight of SSR, hydration, JS execution, network, and CI environment timing — and any variance in that chain breaks the test.

Critical Functions

These server actions MUST have L2 integration tests before any E2E test references them. E2E tests for these flows should assume the action works and test only the browser wiring.

Function	L2 proves	E2E proves (if needed)
Create entity (venture, project, RFP)	Action returns valid ID, DB row exists	Form visible, button clickable, redirect lands
Update entity	Action returns updated fields, DB reflects change	Inline edit saves, optimistic UI resolves
Delete entity	Action removes row, returns confirmation	Confirmation dialog works, item disappears
Auth-gated action	Action rejects without auth, succeeds with auth	Redirect to login, return to original page
File upload	Action stores file, returns URL	File picker opens, progress shown, preview renders

Layer Violation Symptoms

If you see these patterns, the test is at the wrong layer:

Symptom	Diagnosis	Fix
Debugging hydration timing in E2E	Testing server action correctness through browser	Write L2 integration test for the action
`waitForSelector` with 30s+ timeout	Waiting for JS execution, not testing behavior	The claim doesn't need a browser
Test passes locally, fails in CI	CI environment timing differs from local	The claim is environment-dependent, not behavior-dependent
Fixing one E2E uncovers a different E2E failure	Cascading state dependencies	Tests aren't isolated — split into L2 concerns
Spending hours on a test that takes 120s to run	Feedback loop is too slow for debugging	The inner logic should be debuggable at L2 in <5s

Walk the Pipe

When an E2E test fails, do not touch the spec. Walk upstream. Test each joint in the supply chain, starting from the source.

E2E FAILS
    ↓
1. Server action — call it directly. Does it return the right data?
    → NO → Fix the action. Stop. You found the bug.
    → YES ↓
2. Component — does it hydrate and render correctly?
    → NO → Fix the component. Stop.
    → YES ↓
3. Form wiring — does submit trigger the action?
    → NO → Fix the wiring. Stop.
    → YES ↓
4. NOW debug the E2E spec itself.

Each step takes seconds at L2. The entire pipe check takes under 5 minutes. Skipping to step 4 and iterating on selectors, timeouts, and wait strategies is symptom-chasing.

The rule: Never modify a test spec until you have proven the underlying pipe works. If the pipe is broken, no spec fix will help. If the pipe works, the spec fix is obvious.

This is first principles applied to debugging. Decompose the system. Test each joint. The failure is always at the first joint that breaks — everything downstream is noise.

The Symptom Trap

E2E tests fail with browser-level symptoms — hydration attributes missing, timeouts expiring, selectors not matching. The instinct is to fix the spec: adjust selectors, extend timeouts, add wait strategies.

These are all symptom-level fixes. Common anti-patterns:

Extending waitForSelector timeouts
Switching selector strategies (CSS → JS evaluation)
Adding diagnostic logging to the spec
Coordinating click + navigation with Promise.all
Reverting and retrying the same approach

None of these address the source. If a server action returns invalid data, no amount of browser-level debugging will surface it. An L2 integration test calling the action directly isolates the real bug in seconds.

Trophy rule violated: "Stop at the first layer that covers the change." First principles violated: "Start from the source, not the symptom." Cost difference: Hours debugging at L3 vs minutes proving at L2.

Thin Wrappers

The architectural prerequisite that makes testing cheap. In the existing codebase: server actions are thin wrappers that delegate to the composition root (createAppServices). The industry term is "Thin Action / Fat Service" — same pattern, different vocabulary.

Server Action (thin wrapper)          Composition Root (testable)
┌─────────────────────┐              ┌──────────────────────────┐
│ validateOrError()    │──────────→  │ createAppServices()      │
│ assertPermission()   │              │ useCase.execute()        │
│ revalidateTag()      │←──────────  │ return typed result      │
└─────────────────────┘              └──────────────────────────┘

The action validates input (Zod via validateOrError), checks auth (assertPermission), delegates to the composition root, and handles cache revalidation. Business logic lives in use cases — testable at L1 or L2 without mocking Next.js internals.

Without this pattern, every test needs to mock next/headers, next/cache, next/navigation. The mocking cost compounds. With it, use cases are plain TypeScript — the cheapest thing to test.

Nx Target Structure

Three Nx targets per server-action project. File naming conventions route tests to the correct target.

{
  "targets": {
    "test-schema": {
      "executor": "@nx/vite:test",
      "options": {
        "include": ["{projectRoot}/src/**/*.schema.{test,spec}.ts"]
      }
    },
    "test-integration": {
      "executor": "@nx/vite:test",
      "dependsOn": ["test-schema"],
      "options": {
        "include": ["{projectRoot}/src/**/*.integration.{test,spec}.ts"],
        "environment": "node"
      }
    },
    "e2e": {
      "executor": "@nx/playwright:playwright",
      "dependsOn": ["test-integration"]
    }
  }
}

The dependsOn chain enforces the cascade: schema tests must pass before integration tests run. Integration tests must pass before E2E runs. Nx's task graph handles the ordering — no CI YAML job chains needed.

# CI runs this — Nx handles ordering via dependsOn
nx affected -t test-schema test-integration e2e

nx affected only tests what changed. Not run-many across the whole repo.

Economics

Testing infrastructure investment is a cost-benefit algorithm. For CI runner costs and infrastructure options, see Infrastructure Cost Economics.

Cost Model

Cost Layer	Formula	Driving Factors
Build Cost	`Hours to build framework + (hours per test x # tests)`	Developer time, framework complexity, test volume
Maintenance Cost	`Hours per failing test x % failed x # tests x runs/year`	Flakiness, suite brittleness, environment instability
Execution Cost	`Runs/year x (suite runtime x devs waiting x hourly rate)`	CI runner costs, developer wait time, context switching

Total annual cost per layer = Build + Maintenance + Execution.

Benefit Model

Benefit Category	Formula	Primary Value
Bug-Catch Saving	`Bugs caught/year x (Cost if caught later - Cost at this layer)`	Late bugs are 5-30x more expensive
Throughput Saving	`Cycle-time hours saved/year x Blended hourly rate`	Automation removes manual verification blocks

Optimal ROI: filter the vast majority of bugs at L1/L2. Leave only critical user journeys for expensive L3.

Investment Heuristics

Heuristic	When	Why
Prefer Integration	API + DB assertion proves correctness	Bypasses brittle browser layers
Unit Test Logic	Hot paths and pure logic domains	Reduces debugging cost when L2/L3 fail
Restrict E2E	Sign-in, core happy paths, critical regressions	Add more only if repeated escapes from lower layers
Prune Low-Signal	Tests that never catch bugs or double-cover logic	They drive up maintenance without matching benefit

Story Contract Connection

Story Contract rows in PRD specs drive test layer selection. The SPEC-MAP bridges Dream (what to prove) to Engineering (how to prove it).

Story Contract Column	Testing Decision
WHEN (trigger + precondition)	Determines test setup / arrange phase
THEN (data source + threshold)	Becomes the assertion — the exact thing the test proves
ARTIFACT (test file path)	File name routes to Nx target via naming convention
Test Type (unit/integration/e2e)	Maps directly to trophy layer AND Nx target
FORBIDDEN (counterfeit success)	Drives Safety Test — negative test case at same layer

The naming convention is the routing mechanism:

File Pattern	Nx Target	Trophy Layer
`*.schema.test.ts`	`test-schema`	L1 Unit
`*.integration.spec.ts`	`test-integration`	L2 Integration
`*.browser.test.tsx`	`test-integration` (browser mode)	L2 Browser
`*.spec.ts` (in e2e project)	`e2e`	L3 E2E

Benchmark Thresholds

The economics tell you WHERE to invest. The Engineering Quality Benchmarks tell you WHAT the targets are.

Test Layer	Benchmark	Key Threshold
L0 Types	Type Safety	0 errors on `tsc --build`
L1/L2	Repository Quality SLOs	Runtime SLOs per service
L3 E2E	Core Web Vitals + CI Health	LCP <= 2.5s, CI green rate >= 95%
All Layers	CI Pipeline Health	Merge loop <= 10 min, flakiness <= 5%

Dig Deeper

Testing Enforcement — Seven enforcement layers that make layer violations structurally impossible
Vitest — Primary test runner: unit, integration, and browser mode setup with Nx
Testing Stack — All testing tools: Vitest, Playwright, RTL, MSW
Testing Strategy — Layer model, selection rules, recovery backlog
Jest — Legacy runner, migration path to Vitest

Context

SPEC-MAP — Story Contract rows become test files via naming conventions
Flow Engineering — Cost of quality, enforcement hierarchy
Type-First Development — L0 types as free confidence base
Engineering Quality Benchmarks — Target thresholds per layer
Dev Workflow — Build stream and fix stream

Questions

What is the cheapest test that gives sufficient confidence for each change in your codebase?

If the trophy is growing top-heavy for SSR apps, at what point does the cost of E2E justify building @epic-web/app-launcher-style isolated server instances?
When a Story Contract row says "integration" but the test file only asserts on mocked data, is it an integration test or a unit test wearing a costume?
If nx affected only tests what changed, what's the cost of a dependency graph that's wrong — tests that should run but don't?
At what point does the Thin Action / Fat Service pattern become over-extraction — where the service layer adds indirection without testability gain?

The Trophy​

Confidence Coefficient​

Platform Constraints​

E2E Admission Gate​

The Gate​

Critical Functions​

Layer Violation Symptoms​

Walk the Pipe​

The Symptom Trap​

Thin Wrappers​

Nx Target Structure​

Economics​

Cost Model​

Benefit Model​

Investment Heuristics​

Story Contract Connection​

Benchmark Thresholds​

Dig Deeper​

Context​

Questions​