Testing Platform
What is the cheapest test that gives sufficient confidence?
That question — Kent C. Dodds' core insight — drives the entire strategy. Not "how many tests" or "what percentage coverage." The question is economic: where does each dollar of testing effort buy the most confidence?
The Trophy
The Testing Trophy replaces the Testing Pyramid. The pyramid assumed UI was expensive and unit tests were king. Server-side rendering inverted that assumption. When most logic lives in server actions and services, integration tests deliver the highest confidence per dollar.
╔═══════════════╗
║ E2E (L3) ║ ← Browser-dependent flows only
╠═══════════════╣
║ ║
║ Integration ║ ← The bulk: server actions, services, repos
║ (L2) ║
║ ║
╠═══════════════╣
║ Unit (L1) ║ ← Pure transforms, Zod schemas
╠═══════════════╣
║ Static (L0) ║ ← TypeScript compiler — free
╚═══════════════╝
The 2025 update: for SSR-heavy apps (Next.js with Server Components), the trophy grows top-heavy. More logic is server-side, so L2 integration tests cover more surface area. E2E remains expensive — use it only when the browser itself is part of the proof.
Confidence Coefficient
Each layer answers a different question. Stop at the first layer that covers the change.
| Layer | Question | Tool | Speed | Confidence |
|---|---|---|---|---|
| L0 Static | Does it compile? | tsc --noEmit | <1s | Contracts match |
| L1 Unit | Does the transform work? | Vitest | <1s/test | Logic correct |
| L2 Integration | Does the data flow work? | Vitest + real DB | 5-30s/test | Wiring correct |
| L3 E2E | Can a human complete the journey? | Playwright | 30-120s/test | Experience correct |
L0 runs on every change. The compiler is free verification. A broken refactor lights up every consumer instantly.
Platform Constraints
Next.js with React Server Components changes what's testable at each layer.
| Code Type | Testable At | Why |
|---|---|---|
| Zod schemas, DTOs, pure functions | L1 Unit | Zero dependencies |
Server actions ("use server") | L2 Integration | Mock next/headers, next/cache — test the function directly |
| Repository/adapter layer | L2 Integration | Real database, real queries |
API routes (route.ts) | L2 Integration | NTARH for real Next.js routing |
| Client components with state | L2 Browser Mode | Vitest Browser Mode + MSW |
| Async Server Components | L3 E2E only | Cannot render outside Next.js server |
| Multi-step browser flows | L3 E2E only | Browser IS the proof |
E2E Admission Gate
Before writing any E2E test, the claim must pass this gate. If it fails, write a cheaper test.
The Gate
| Question | Yes → E2E | No → lower layer |
|---|---|---|
| Does this test require a real browser to prove its claim? | Auth redirects, layout, a11y, keyboard nav | Server action results, validation logic, data transforms |
| Is the browser the only environment where this behavior exists? | Client-side hydration state, CSS rendering | Server action return values, DB writes, redirect URLs |
| Has the underlying logic already been proven at L2? | E2E confirms wiring only | Write L2 first, then decide if E2E adds signal |
E2E tests that fail this gate are integration tests running in an expensive container. They carry the full weight of SSR, hydration, JS execution, network, and CI environment timing — and any variance in that chain breaks the test.
Critical Functions
These server actions MUST have L2 integration tests before any E2E test references them. E2E tests for these flows should assume the action works and test only the browser wiring.
| Function | L2 proves | E2E proves (if needed) |
|---|---|---|
| Create entity (venture, project, RFP) | Action returns valid ID, DB row exists | Form visible, button clickable, redirect lands |
| Update entity | Action returns updated fields, DB reflects change | Inline edit saves, optimistic UI resolves |
| Delete entity | Action removes row, returns confirmation | Confirmation dialog works, item disappears |
| Auth-gated action | Action rejects without auth, succeeds with auth | Redirect to login, return to original page |
| File upload | Action stores file, returns URL | File picker opens, progress shown, preview renders |
Layer Violation Symptoms
If you see these patterns, the test is at the wrong layer:
| Symptom | Diagnosis | Fix |
|---|---|---|
| Debugging hydration timing in E2E | Testing server action correctness through browser | Write L2 integration test for the action |
waitForSelector with 30s+ timeout | Waiting for JS execution, not testing behavior | The claim doesn't need a browser |
| Test passes locally, fails in CI | CI environment timing differs from local | The claim is environment-dependent, not behavior-dependent |
| Fixing one E2E uncovers a different E2E failure | Cascading state dependencies | Tests aren't isolated — split into L2 concerns |
| Spending hours on a test that takes 120s to run | Feedback loop is too slow for debugging | The inner logic should be debuggable at L2 in <5s |
Walk the Pipe
When an E2E test fails, do not touch the spec. Walk upstream. Test each joint in the supply chain, starting from the source.
E2E FAILS
↓
1. Server action — call it directly. Does it return the right data?
→ NO → Fix the action. Stop. You found the bug.
→ YES ↓
2. Component — does it hydrate and render correctly?
→ NO → Fix the component. Stop.
→ YES ↓
3. Form wiring — does submit trigger the action?
→ NO → Fix the wiring. Stop.
→ YES ↓
4. NOW debug the E2E spec itself.
Each step takes seconds at L2. The entire pipe check takes under 5 minutes. Skipping to step 4 and iterating on selectors, timeouts, and wait strategies is symptom-chasing.
The rule: Never modify a test spec until you have proven the underlying pipe works. If the pipe is broken, no spec fix will help. If the pipe works, the spec fix is obvious.
This is first principles applied to debugging. Decompose the system. Test each joint. The failure is always at the first joint that breaks — everything downstream is noise.
The Symptom Trap
E2E tests fail with browser-level symptoms — hydration attributes missing, timeouts expiring, selectors not matching. The instinct is to fix the spec: adjust selectors, extend timeouts, add wait strategies.
These are all symptom-level fixes. Common anti-patterns:
- Extending
waitForSelectortimeouts - Switching selector strategies (CSS → JS evaluation)
- Adding diagnostic logging to the spec
- Coordinating click + navigation with
Promise.all - Reverting and retrying the same approach
None of these address the source. If a server action returns invalid data, no amount of browser-level debugging will surface it. An L2 integration test calling the action directly isolates the real bug in seconds.
Trophy rule violated: "Stop at the first layer that covers the change." First principles violated: "Start from the source, not the symptom." Cost difference: Hours debugging at L3 vs minutes proving at L2.
Thin Wrappers
The architectural prerequisite that makes testing cheap. In the existing codebase: server actions are thin wrappers that delegate to the composition root (createAppServices). The industry term is "Thin Action / Fat Service" — same pattern, different vocabulary.
Server Action (thin wrapper) Composition Root (testable)
┌──────── ─────────────┐ ┌──────────────────────────┐
│ validateOrError() │──────────→ │ createAppServices() │
│ assertPermission() │ │ useCase.execute() │
│ revalidateTag() │←────────── │ return typed result │
└─────────────────────┘ └──────────────────────────┘
The action validates input (Zod via validateOrError), checks auth (assertPermission), delegates to the composition root, and handles cache revalidation. Business logic lives in use cases — testable at L1 or L2 without mocking Next.js internals.
Without this pattern, every test needs to mock next/headers, next/cache, next/navigation. The mocking cost compounds. With it, use cases are plain TypeScript — the cheapest thing to test.
Nx Target Structure
Three Nx targets per server-action project. File naming conventions route tests to the correct target.
{
"targets": {
"test-schema": {
"executor": "@nx/vite:test",
"options": {
"include": ["{projectRoot}/src/**/*.schema.{test,spec}.ts"]
}
},
"test-integration": {
"executor": "@nx/vite:test",
"dependsOn": ["test-schema"],
"options": {
"include": ["{projectRoot}/src/**/*.integration.{test,spec}.ts"],
"environment": "node"
}
},
"e2e": {
"executor": "@nx/playwright:playwright",
"dependsOn": ["test-integration"]
}
}
}
The dependsOn chain enforces the cascade: schema tests must pass before integration tests run. Integration tests must pass before E2E runs. Nx's task graph handles the ordering — no CI YAML job chains needed.
# CI runs this — Nx handles ordering via dependsOn
nx affected -t test-schema test-integration e2e
nx affected only tests what changed. Not run-many across the whole repo.
Economics
Testing infrastructure investment is a cost-benefit algorithm.
Cost Model
| Cost Layer | Formula | Driving Factors |
|---|---|---|
| Build Cost | Hours to build framework + (hours per test x # tests) | Developer time, framework complexity, test volume |
| Maintenance Cost | Hours per failing test x % failed x # tests x runs/year | Flakiness, suite brittleness, environment instability |
| Execution Cost | Runs/year x (suite runtime x devs waiting x hourly rate) | CI runner costs, developer wait time, context switching |
Total annual cost per layer = Build + Maintenance + Execution.
Benefit Model
| Benefit Category | Formula | Primary Value |
|---|---|---|
| Bug-Catch Saving | Bugs caught/year x (Cost if caught later - Cost at this layer) | Late bugs are 5-30x more expensive |
| Throughput Saving | Cycle-time hours saved/year x Blended hourly rate | Automation removes manual verification blocks |
Optimal ROI: filter the vast majority of bugs at L1/L2. Leave only critical user journeys for expensive L3.
Investment Heuristics
| Heuristic | When | Why |
|---|---|---|
| Prefer Integration | API + DB assertion proves correctness | Bypasses brittle browser layers |
| Unit Test Logic | Hot paths and pure logic domains | Reduces debugging cost when L2/L3 fail |
| Restrict E2E | Sign-in, core happy paths, critical regressions | Add more only if repeated escapes from lower layers |
| Prune Low-Signal | Tests that never catch bugs or double-cover logic | They drive up maintenance without matching benefit |
Story Contract Connection
Story Contract rows in PRD specs drive test layer selection. The SPEC-MAP bridges Dream (what to prove) to Engineering (how to prove it).
| Story Contract Column | Testing Decision |
|---|---|
| WHEN (trigger + precondition) | Determines test setup / arrange phase |
| THEN (data source + threshold) | Becomes the assertion — the exact thing the test proves |
| ARTIFACT (test file path) | File name routes to Nx target via naming convention |
| Test Type (unit/integration/e2e) | Maps directly to trophy layer AND Nx target |
| FORBIDDEN (counterfeit success) | Drives Safety Test — negative test case at same layer |
The naming convention is the routing mechanism:
| File Pattern | Nx Target | Trophy Layer |
|---|---|---|
*.schema.test.ts | test-schema | L1 Unit |
*.integration.spec.ts | test-integration | L2 Integration |
*.browser.test.tsx | test-integration (browser mode) | L2 Browser |
*.spec.ts (in e2e project) | e2e | L3 E2E |
Benchmark Thresholds
The economics tell you WHERE to invest. The Engineering Quality Benchmarks tell you WHAT the targets are.
| Test Layer | Benchmark | Key Threshold |
|---|---|---|
| L0 Types | Type Safety | 0 errors on tsc --build |
| L1/L2 | Repository Quality SLOs | Runtime SLOs per service |
| L3 E2E | Core Web Vitals + CI Health | LCP <= 2.5s, CI green rate >= 95% |
| All Layers | CI Pipeline Health | Merge loop <= 10 min, flakiness <= 5% |
Dig Deeper
- Vitest — Primary test runner: unit, integration, and browser mode setup with Nx
- Testing Stack — All testing tools: Vitest, Playwright, RTL, MSW
- Testing Strategy — Layer model, selection rules, recovery backlog
- Jest — Legacy runner, migration path to Vitest
Context
- SPEC-MAP — Story Contract rows become test files via naming conventions
- Flow Engineering — Cost of quality, enforcement hierarchy
- Type-First Development — L0 types as free confidence base
- Engineering Quality Benchmarks — Target thresholds per layer
- Dev Workflow — Build stream and fix stream
Questions
What is the cheapest test that gives sufficient confidence for each change in your codebase?
- If the trophy is growing top-heavy for SSR apps, at what point does the cost of E2E justify building
@epic-web/app-launcher-style isolated server instances? - When a Story Contract row says "integration" but the test file only asserts on mocked data, is it an integration test or a unit test wearing a costume?
- If
nx affectedonly tests what changed, what's the cost of a dependency graph that's wrong — tests that should run but don't? - At what point does the Thin Action / Fat Service pattern become over-extraction — where the service layer adds indirection without testability gain?