Skip to main content

Testing Platform

What is the cheapest test that gives sufficient confidence?

That question — Kent C. Dodds' core insight — drives the entire strategy. Not "how many tests" or "what percentage coverage." The question is economic: where does each dollar of testing effort buy the most confidence?

The Trophy

The Testing Trophy replaces the Testing Pyramid. The pyramid assumed UI was expensive and unit tests were king. Server-side rendering inverted that assumption. When most logic lives in server actions and services, integration tests deliver the highest confidence per dollar.

        ╔═══════════════╗
║ E2E (L3) ║ ← Browser-dependent flows only
╠═══════════════╣
║ ║
║ Integration ║ ← The bulk: server actions, services, repos
║ (L2) ║
║ ║
╠═══════════════╣
║ Unit (L1) ║ ← Pure transforms, Zod schemas
╠═══════════════╣
║ Static (L0) ║ ← TypeScript compiler — free
╚═══════════════╝

The 2025 update: for SSR-heavy apps (Next.js with Server Components), the trophy grows top-heavy. More logic is server-side, so L2 integration tests cover more surface area. E2E remains expensive — use it only when the browser itself is part of the proof.

Confidence Coefficient

Each layer answers a different question. Stop at the first layer that covers the change.

LayerQuestionToolSpeedConfidence
L0 StaticDoes it compile?tsc --noEmit<1sContracts match
L1 UnitDoes the transform work?Vitest<1s/testLogic correct
L2 IntegrationDoes the data flow work?Vitest + real DB5-30s/testWiring correct
L3 E2ECan a human complete the journey?Playwright30-120s/testExperience correct

L0 runs on every change. The compiler is free verification. A broken refactor lights up every consumer instantly.

Platform Constraints

Next.js with React Server Components changes what's testable at each layer.

Code TypeTestable AtWhy
Zod schemas, DTOs, pure functionsL1 UnitZero dependencies
Server actions ("use server")L2 IntegrationMock next/headers, next/cache — test the function directly
Repository/adapter layerL2 IntegrationReal database, real queries
API routes (route.ts)L2 IntegrationNTARH for real Next.js routing
Client components with stateL2 Browser ModeVitest Browser Mode + MSW
Async Server ComponentsL3 E2E onlyCannot render outside Next.js server
Multi-step browser flowsL3 E2E onlyBrowser IS the proof

E2E Admission Gate

Before writing any E2E test, the claim must pass this gate. If it fails, write a cheaper test.

The Gate

QuestionYes → E2ENo → lower layer
Does this test require a real browser to prove its claim?Auth redirects, layout, a11y, keyboard navServer action results, validation logic, data transforms
Is the browser the only environment where this behavior exists?Client-side hydration state, CSS renderingServer action return values, DB writes, redirect URLs
Has the underlying logic already been proven at L2?E2E confirms wiring onlyWrite L2 first, then decide if E2E adds signal

E2E tests that fail this gate are integration tests running in an expensive container. They carry the full weight of SSR, hydration, JS execution, network, and CI environment timing — and any variance in that chain breaks the test.

Critical Functions

These server actions MUST have L2 integration tests before any E2E test references them. E2E tests for these flows should assume the action works and test only the browser wiring.

FunctionL2 provesE2E proves (if needed)
Create entity (venture, project, RFP)Action returns valid ID, DB row existsForm visible, button clickable, redirect lands
Update entityAction returns updated fields, DB reflects changeInline edit saves, optimistic UI resolves
Delete entityAction removes row, returns confirmationConfirmation dialog works, item disappears
Auth-gated actionAction rejects without auth, succeeds with authRedirect to login, return to original page
File uploadAction stores file, returns URLFile picker opens, progress shown, preview renders

Layer Violation Symptoms

If you see these patterns, the test is at the wrong layer:

SymptomDiagnosisFix
Debugging hydration timing in E2ETesting server action correctness through browserWrite L2 integration test for the action
waitForSelector with 30s+ timeoutWaiting for JS execution, not testing behaviorThe claim doesn't need a browser
Test passes locally, fails in CICI environment timing differs from localThe claim is environment-dependent, not behavior-dependent
Fixing one E2E uncovers a different E2E failureCascading state dependenciesTests aren't isolated — split into L2 concerns
Spending hours on a test that takes 120s to runFeedback loop is too slow for debuggingThe inner logic should be debuggable at L2 in <5s

Walk the Pipe

When an E2E test fails, do not touch the spec. Walk upstream. Test each joint in the supply chain, starting from the source.

E2E FAILS

1. Server action — call it directly. Does it return the right data?
→ NO → Fix the action. Stop. You found the bug.
→ YES ↓
2. Component — does it hydrate and render correctly?
→ NO → Fix the component. Stop.
→ YES ↓
3. Form wiring — does submit trigger the action?
→ NO → Fix the wiring. Stop.
→ YES ↓
4. NOW debug the E2E spec itself.

Each step takes seconds at L2. The entire pipe check takes under 5 minutes. Skipping to step 4 and iterating on selectors, timeouts, and wait strategies is symptom-chasing.

The rule: Never modify a test spec until you have proven the underlying pipe works. If the pipe is broken, no spec fix will help. If the pipe works, the spec fix is obvious.

This is first principles applied to debugging. Decompose the system. Test each joint. The failure is always at the first joint that breaks — everything downstream is noise.

The Symptom Trap

E2E tests fail with browser-level symptoms — hydration attributes missing, timeouts expiring, selectors not matching. The instinct is to fix the spec: adjust selectors, extend timeouts, add wait strategies.

These are all symptom-level fixes. Common anti-patterns:

  • Extending waitForSelector timeouts
  • Switching selector strategies (CSS → JS evaluation)
  • Adding diagnostic logging to the spec
  • Coordinating click + navigation with Promise.all
  • Reverting and retrying the same approach

None of these address the source. If a server action returns invalid data, no amount of browser-level debugging will surface it. An L2 integration test calling the action directly isolates the real bug in seconds.

Trophy rule violated: "Stop at the first layer that covers the change." First principles violated: "Start from the source, not the symptom." Cost difference: Hours debugging at L3 vs minutes proving at L2.

Thin Wrappers

The architectural prerequisite that makes testing cheap. In the existing codebase: server actions are thin wrappers that delegate to the composition root (createAppServices). The industry term is "Thin Action / Fat Service" — same pattern, different vocabulary.

Server Action (thin wrapper)          Composition Root (testable)
┌─────────────────────┐ ┌──────────────────────────┐
│ validateOrError() │──────────→ │ createAppServices() │
│ assertPermission() │ │ useCase.execute() │
│ revalidateTag() │←────────── │ return typed result │
└─────────────────────┘ └──────────────────────────┘

The action validates input (Zod via validateOrError), checks auth (assertPermission), delegates to the composition root, and handles cache revalidation. Business logic lives in use cases — testable at L1 or L2 without mocking Next.js internals.

Without this pattern, every test needs to mock next/headers, next/cache, next/navigation. The mocking cost compounds. With it, use cases are plain TypeScript — the cheapest thing to test.

Nx Target Structure

Three Nx targets per server-action project. File naming conventions route tests to the correct target.

{
"targets": {
"test-schema": {
"executor": "@nx/vite:test",
"options": {
"include": ["{projectRoot}/src/**/*.schema.{test,spec}.ts"]
}
},
"test-integration": {
"executor": "@nx/vite:test",
"dependsOn": ["test-schema"],
"options": {
"include": ["{projectRoot}/src/**/*.integration.{test,spec}.ts"],
"environment": "node"
}
},
"e2e": {
"executor": "@nx/playwright:playwright",
"dependsOn": ["test-integration"]
}
}
}

The dependsOn chain enforces the cascade: schema tests must pass before integration tests run. Integration tests must pass before E2E runs. Nx's task graph handles the ordering — no CI YAML job chains needed.

# CI runs this — Nx handles ordering via dependsOn
nx affected -t test-schema test-integration e2e

nx affected only tests what changed. Not run-many across the whole repo.

Economics

Testing infrastructure investment is a cost-benefit algorithm.

Cost Model

Cost LayerFormulaDriving Factors
Build CostHours to build framework + (hours per test x # tests)Developer time, framework complexity, test volume
Maintenance CostHours per failing test x % failed x # tests x runs/yearFlakiness, suite brittleness, environment instability
Execution CostRuns/year x (suite runtime x devs waiting x hourly rate)CI runner costs, developer wait time, context switching

Total annual cost per layer = Build + Maintenance + Execution.

Benefit Model

Benefit CategoryFormulaPrimary Value
Bug-Catch SavingBugs caught/year x (Cost if caught later - Cost at this layer)Late bugs are 5-30x more expensive
Throughput SavingCycle-time hours saved/year x Blended hourly rateAutomation removes manual verification blocks

Optimal ROI: filter the vast majority of bugs at L1/L2. Leave only critical user journeys for expensive L3.

Investment Heuristics

HeuristicWhenWhy
Prefer IntegrationAPI + DB assertion proves correctnessBypasses brittle browser layers
Unit Test LogicHot paths and pure logic domainsReduces debugging cost when L2/L3 fail
Restrict E2ESign-in, core happy paths, critical regressionsAdd more only if repeated escapes from lower layers
Prune Low-SignalTests that never catch bugs or double-cover logicThey drive up maintenance without matching benefit

Story Contract Connection

Story Contract rows in PRD specs drive test layer selection. The SPEC-MAP bridges Dream (what to prove) to Engineering (how to prove it).

Story Contract ColumnTesting Decision
WHEN (trigger + precondition)Determines test setup / arrange phase
THEN (data source + threshold)Becomes the assertion — the exact thing the test proves
ARTIFACT (test file path)File name routes to Nx target via naming convention
Test Type (unit/integration/e2e)Maps directly to trophy layer AND Nx target
FORBIDDEN (counterfeit success)Drives Safety Test — negative test case at same layer

The naming convention is the routing mechanism:

File PatternNx TargetTrophy Layer
*.schema.test.tstest-schemaL1 Unit
*.integration.spec.tstest-integrationL2 Integration
*.browser.test.tsxtest-integration (browser mode)L2 Browser
*.spec.ts (in e2e project)e2eL3 E2E

Benchmark Thresholds

The economics tell you WHERE to invest. The Engineering Quality Benchmarks tell you WHAT the targets are.

Test LayerBenchmarkKey Threshold
L0 TypesType Safety0 errors on tsc --build
L1/L2Repository Quality SLOsRuntime SLOs per service
L3 E2ECore Web Vitals + CI HealthLCP <= 2.5s, CI green rate >= 95%
All LayersCI Pipeline HealthMerge loop <= 10 min, flakiness <= 5%

Dig Deeper

  • Vitest — Primary test runner: unit, integration, and browser mode setup with Nx
  • Testing Stack — All testing tools: Vitest, Playwright, RTL, MSW
  • Testing Strategy — Layer model, selection rules, recovery backlog
  • Jest — Legacy runner, migration path to Vitest

Context

Questions

What is the cheapest test that gives sufficient confidence for each change in your codebase?

  • If the trophy is growing top-heavy for SSR apps, at what point does the cost of E2E justify building @epic-web/app-launcher-style isolated server instances?
  • When a Story Contract row says "integration" but the test file only asserts on mocked data, is it an integration test or a unit test wearing a costume?
  • If nx affected only tests what changed, what's the cost of a dependency graph that's wrong — tests that should run but don't?
  • At what point does the Thin Action / Fat Service pattern become over-extraction — where the service layer adds indirection without testability gain?