Skip to main content

Testing Strategy

What is the cheapest test that proves your change works?

This page defines the layer model and selection rules. For the trophy strategy, economics, and Nx target structure, see the Testing Platform. For Vitest setup and examples, see Vitest.

L0 TYPES ← L1 UNIT ← L2 INTEGRATION ← L3 E2E
│ │ │ │
▼ ▼ ▼ ▼
<1s <1s 5-30s 30-120s
Compiler Pure logic Real database Real browser
Free Cheap Moderate Expensive

Most codebases are inverted — 80% E2E, 5% unit. Every browser test that proves something a function call can prove is waste. The browser is a last resort, not a starting point.

The Layer Model

Four layers. Each proves something the layer below cannot. Stop at the first layer that covers the change.

LayerWhat It ProvesToolCost
L0 TypesCompiler accepts it — contracts match, imports resolve, refactors propagatetsc --noEmit<1s
L1 UnitPure transform works — input A produces output B, no I/OVitest (target: Jest migration)<1s per test
L2 IntegrationData layer works end-to-end — server action + repository produce correct dataVitest + real DB (target: Jest current)5-30s per test
L3 E2EHuman can complete the journey — multi-step browser interaction worksPlaywright30-120s per test

An optional Intent layer sits between Integration and E2E for API contract validation (A2A protocol, webhook shapes). Most teams don't need it until they have agent-to-agent communication.

Selection Rule

Stop at the first match, top to bottom:

Code ChangedLayer
Pure function, Zod schema, DTO mapping, domain logicL1 Unit
Server action, repository, adapter, composition rootL2 Integration
API contract, agent protocol, webhook payloadIntent
UI journey requiring browser interaction, layout, a11yL3 E2E

L0 runs on every change regardless. TypeScript is free verification — a broken refactor that renames a field lights up every consumer instantly. This is why type-first development matters: the compiler is your most cost-effective test suite.

The Browser Gate

Before writing any E2E test, ask: can this test's core claim be proven without a browser?

If the answer is yes, write the cheaper test. A server action that creates a user and returns a result is an L2 integration test. You only need L3 when the browser itself is part of the proof — form interaction, navigation flow, responsive layout, accessibility.

The decomposition test: If an E2E test is proving more than one claim, it should be split. Each claim gets tested at its cheapest viable layer.

Combined E2E claimDecomposed
"Form renders AND action creates entity AND redirect works"L2: action returns valid ID. L3: form visible, button clicks, redirect lands.
"Validation rejects bad input AND shows error message"L1: schema rejects invalid input. L3: error message visible after submit.
"Auth blocks access AND redirects to login"L2: action throws without auth. L3: page redirects to login URL.

The cost signal: If you spend more than 30 minutes debugging an E2E test, the test is probably at the wrong layer. E2E tests should be thin wiring checks that pass on first run if the underlying logic works. When they don't, the debugging belongs at L2 where the feedback loop is seconds, not minutes.

Story to Layer

Story Contract rows drive test layer selection. The SPEC-MAP bridges intent (what to prove) to implementation (how to prove it).

Story Contract ColumnLayer Decision
Test Type = unitL1 — schema or pure function test
Test Type = integrationL2 — server action with mocked composition root
Test Type = e2eL3 — browser wiring only, logic already proven at L2
THEN names a data source + thresholdL2 minimum — must hit real data to prove the assertion
FORBIDDEN names a counterfeit successSafety Test at the SAME layer — negative test case

The naming convention routes Story Contract ARTIFACT paths to Nx targets:

Story Contract ARTIFACTNx TargetLayer
*.schema.spec.tstest-schemaL1
*.integration.spec.tstest-integrationL2
*.spec.ts (in e2e project)e2eL3

FORBIDDEN as Tests

Every FORBIDDEN column in the Story Contract becomes a negative test at the same layer. Example: if the Story Contract says "FORBIDDEN: form submits with invalid data because client validation is bypassed," the Safety Test is:

// L3 E2E — tests the server-side rejection when client validation is bypassed
test('server rejects invalid input even without client validation', async ({ page }) => {
await page.goto('/rfp/ventures/new');

// Bypass client-side validation by removing the attribute
await page.evaluate(() => {
document.querySelector('form')?.setAttribute('novalidate', '');
});

await page.getByLabel('Name').fill(''); // Empty — should fail server-side
await page.getByRole('button', { name: /create/i }).click();

// Server must reject — not redirect to success
await expect(page.getByRole('status')).toContainText(/required/i);
await expect(page).not.toHaveURL(/\/rfp\/ventures\/[a-f0-9-]+$/);
});

This catches the case where TanStack Form validators are dead code because HTML noValidate wasn't set, and the server action's validateOrError() is the real gate. The FORBIDDEN column makes this test mandatory, not optional.

Thin Wrappers

The architectural prerequisite that makes testing cheap. In the existing codebase vocabulary: server actions are thin wrappers that delegate to the composition root.

Server Action (thin wrapper)          Service via Composition Root
┌─────────────────────┐ ┌──────────────────────────┐
│ validateOrError() │──────────→ │ createAppServices() │
│ assertPermission() │ │ useCase.execute() │
│ revalidateTag() │←────────── │ return typed result │
└─────────────────────┘ └──────────────────────────┘

The action validates input (Zod), checks auth (assertPermission), delegates to the composition root (createAppServices), and handles cache revalidation. Business logic lives in use cases behind the composition root — testable at L2 with mocked services, no Next.js internals needed.

Industry term: "Thin Action / Fat Service" pattern. Same concept, different vocabulary.

Without this pattern, every test needs to mock next/headers, next/cache, next/navigation. The mocking cost compounds. With it, use cases are plain TypeScript.

Nx-Native CI

The L1 → L2 → L3 cascade is enforced by Nx's task graph, not CI YAML job chains.

// nx.json targetDefaults
{
"test-integration": {
"dependsOn": ["test-schema"]
},
"e2e": {
"dependsOn": ["test-integration"]
}
}

CI runs one command:

nx affected -t test-schema test-integration e2e --base=main

Nx handles ordering via dependsOn. Only tests affected by the changeset run. Change a venture action? Only venture schema tests, venture integration tests, and the venture-scoped E2E run — not the whole suite.

Not: separate CI jobs with needs: chains. Not: run-many across every project. Nx's affected graph is the orchestrator.

The Hexagonal Advantage

When server actions are pure TypeScript functions behind ports, most logic is testable without React or a browser:

┌──────────────────────────────────┐
│ PRESENTATION (L3) │ ← E2E only for browser-dependent flows
│ ┌──────────────────────────┐ │
│ │ APPLICATION (L2) │ │ ← Integration: server actions, use cases
│ │ ┌──────────────────┐ │ │
│ │ │ INFRASTRUCTURE │ │ │ ← Integration: repositories, adapters
│ │ │ ┌──────────┐ │ │ │
│ │ │ │ DOMAIN │ │ │ │ ← Unit: pure transforms, validators
│ │ │ └──────────┘ │ │ │
│ │ └──────────────────┘ │ │
│ └──────────────────────────┘ │
└──────────────────────────────────┘
LayerTest StrategyWhy
DomainL1 Unit — no mocks neededPure functions, Zod schemas, DTOs. Zero dependencies.
InfrastructureL2 Integration — real databaseRepositories implement domain ports. Prove data round-trips.
ApplicationL2 Integration — mocked composition rootServer actions orchestrate use cases. Prove the composition.
PresentationL3 E2E — only for browser interactionsComponents consume application layer. Most rendering is server-side.

This maps directly to the layer model. Domain types are the source of truth. Tests prove each layer honors the contracts.

Quick Reference

When you're staring at a change and wondering "what layer does this go in?"

I changed...Test layerFile pattern
A Zod schemaL1*.schema.spec.ts
A pure function / DTO mappingL1*.schema.spec.ts
A server actionL2*.integration.spec.ts
A repository / adapterL2*.integration.spec.ts
An API routeL2*.integration.spec.ts (via NTARH)
A client component with stateL2*.browser.test.tsx (Vitest Browser Mode)
A form submission flowL3*.spec.ts (Playwright)
Auth / OAuth redirectL3*.spec.ts (Playwright)
Layout / responsive behaviorL3*.spec.ts (Playwright)
A TypeScript type or interfaceL0No test file — tsc --noEmit covers it

Recovery Backlog

Starting from lost ground? Prioritized list:

  1. Wire Nx targets — Add test-schema and test-integration to targetDefaults in nx.json
  2. Shared test helperscreateMockAppServices(), createMockAuthContext() in a testing lib
  3. Top 3 actions — Write L1 + L2 for the three most critical server actions
  4. Security audit — Every "use server" has validateOrError() and assertPermission()
  5. CI cascade — Replace CI YAML job chains with nx affected -t test-schema test-integration e2e
  6. Generator scaffolding — Nx generator emits action + schema test + integration test
  7. E2E thinning — Audit existing E2E: can the core claim be proven at L2?
  8. Vitest migration — Replace Jest with Vitest (@nx/vite:test executor)
  9. E2E splitting — Split monolithic E2E into feature-scoped projects with implicitDependencies

Future direction: @epic-web/app-launcher — per-test isolated server instances. Eliminates the mock boundary entirely. Document as direction, not current capability.

Patterns

L0: Types as Tests

The cheapest verification. Strict TypeScript catches broken refactors, wrong argument types, missing fields, and import errors before any test runs.

tsconfig.json strict mode flags:
strict: true
noUncheckedIndexedAccess: true
noUnusedLocals: true
exactOptionalPropertyTypes: true

When a domain contract changes, the compiler lights up every file that needs updating — outward through infrastructure, application, presentation. Red squiggles are breadcrumbs.

L1: Unit Tests

Pure functions. No database, no DOM, no HTTP. Under 100ms per test.

Good candidates: score calculations, DTO mappings, Zod schema validation, status logic, discount rules, formatting functions. Anything that takes data in and returns data out.

L2: Integration Tests

The bulk of a server-action architecture. Real database (or mocked composition root — both valid at L2), real queries, real data. Prove that the composition of domain + infrastructure + application produces correct results.

Rules:

  • Real database or mocked composition root — not mocked repositories
  • Clean up test data after each test — no shared state
  • Test isolation — each test is independent

Server actions marked "use server" are async functions. In a test runner, the directive is a bundler instruction, not a runtime constraint. Mock next/headers, next/navigation, next/cache — test the function directly.

L3: E2E Tests

Last resort. Browser-dependent flows only. Expensive but irreplaceable for:

  • Multi-step form interactions (fill, submit, redirect, error states)
  • Authentication flows (OAuth redirects, session management)
  • Responsive layout behavior across breakpoints
  • Keyboard navigation and accessibility
  • Client-side state that only exists in the browser
  • Wiring confirmation — TanStack Form → server action → redirect

Selectors: Use data-testid attributes, not CSS selectors or XPath. Semantic selectors survive redesigns.

Timeouts: Use waitForSelector(), never arbitrary sleep(). Flaky tests are worse than no tests.

Playwright POM: Use Page Object Model for E2E. Feature specs, not function specs — each POM represents a user journey, not a page.

Zod as Contracts

Zod schemas serve as runtime contract verification at every data boundary. The schema IS the test:

L0: tsc catches type mismatches at compile time
L1: Zod catches shape/constraint violations at runtime
L2: Integration tests verify business logic with validated data
L3: E2E verifies the full flow

When schemas define both the TypeScript type (z.infer&lt;typeof Schema>) and the runtime validator, schema drift becomes impossible. The type IS the validator. This is the type boundary made enforceable.

Generator Test Parity

Generated test files must pass the same gates as hand-written tests. A generator that produces specs which fail pre-commit hooks is broken — regardless of whether the tests pass when run directly.

Generator StatePre-commit ResultMaturity
Spec passes lint-stagedCleanGenerator tier — enforcement is free
Spec fails lint-stagedHook rejectionExpertise tier — every scaffold needs manual cleanup

The Nx generator scaffolds action + schema test + integration test together. It uses @nx/devkit patterns: generateFiles for templates, updateProjectConfiguration to wire test-schema and test-integration targets into project.json.

Rebalancing

Don't delete E2E specs. Write cheaper tests for the same logic, then remove the redundant E2E coverage.

The audit question for each existing E2E test: Can the core claim be proven without a browser?

Likely outcome for a server-action-heavy app:

  • ~5% remain E2E — auth flows, form wiring, layout
  • ~80% become integration — server actions are the bulk of logic
  • ~10% become unit — pure transforms extracted from actions
  • ~5% become intent — API contract validation

Context

Questions

If the cheapest test that proves correctness is a type check, why do most teams start with the most expensive one?

  • When a server action is a pure function behind a composition root, what justifies testing it through a browser instead of a function call?
  • If the FORBIDDEN column drives Safety Tests, what happens to test coverage when Story Contract rows skip the FORBIDDEN column?
  • At what point does generator test parity — generated specs passing the same hooks as hand-written tests — eliminate the need for code review of generated output?
  • If mocking the database hides the bugs that matter, what category of bugs does mocking the browser hide?