Testing Strategy

What is the cheapest test that proves your change works?

This page defines the layer model and selection rules. For the trophy strategy, economics, and Nx target structure, see the Testing Platform. For Vitest setup and examples, see Vitest.

L0 TYPES ← L1 UNIT ← L2 INTEGRATION ← L3 E2E
   │          │            │               │
   ▼          ▼            ▼               ▼
 <1s        <1s          5-30s          30-120s
Compiler   Pure logic   Real database   Real browser
Free       Cheap        Moderate        Expensive

Most codebases are inverted — 80% E2E, 5% unit. Every browser test that proves something a function call can prove is waste. The browser is a last resort, not a starting point.

The Layer Model

Four layers. Each proves something the layer below cannot. Stop at the first layer that covers the change.

Layer	What It Proves	Tool	Cost
L0 Types	Compiler accepts it — contracts match, imports resolve, refactors propagate	`tsc --noEmit`	<1s
L1 Unit	Pure transform works — input A produces output B, no I/O	Vitest (target: Jest migration)	<1s per test
L2 Integration	Data layer works end-to-end — server action + repository produce correct data	Vitest + real DB (target: Jest current)	5-30s per test
L3 E2E	Human can complete the journey — multi-step browser interaction works	Playwright	30-120s per test

An optional Intent layer sits between Integration and E2E for API contract validation (A2A protocol, webhook shapes). Most teams don't need it until they have agent-to-agent communication.

Selection Rule

Stop at the first match, top to bottom:

Code Changed	Layer
Pure function, Zod schema, DTO mapping, domain logic	L1 Unit
Server action, repository, adapter, composition root	L2 Integration
API contract, agent protocol, webhook payload	Intent
UI journey requiring browser interaction, layout, a11y	L3 E2E

L0 runs on every change regardless. TypeScript is free verification — a broken refactor that renames a field lights up every consumer instantly. This is why type-first development matters: the compiler is your most cost-effective test suite.

The Browser Gate

Before writing any E2E test, ask: can this test's core claim be proven without a browser?

If the answer is yes, write the cheaper test. A server action that creates a user and returns a result is an L2 integration test. You only need L3 when the browser itself is part of the proof — form interaction, navigation flow, responsive layout, accessibility.

The decomposition test: If an E2E test is proving more than one claim, it should be split. Each claim gets tested at its cheapest viable layer.

Combined E2E claim	Decomposed
"Form renders AND action creates entity AND redirect works"	L2: action returns valid ID. L3: form visible, button clicks, redirect lands.
"Validation rejects bad input AND shows error message"	L1: schema rejects invalid input. L3: error message visible after submit.
"Auth blocks access AND redirects to login"	L2: action throws without auth. L3: page redirects to login URL.

The cost signal: If you spend more than 30 minutes debugging an E2E test, the test is probably at the wrong layer. E2E tests should be thin wiring checks that pass on first run if the underlying logic works. When they don't, the debugging belongs at L2 where the feedback loop is seconds, not minutes.

Story to Layer

Story Contract rows drive test layer selection. The SPEC-MAP bridges intent (what to prove) to implementation (how to prove it).

Story Contract Column	Layer Decision
Test Type = `unit`	L1 — schema or pure function test
Test Type = `integration`	L2 — server action with mocked composition root
Test Type = `e2e`	L3 — browser wiring only, logic already proven at L2
THEN names a data source + threshold	L2 minimum — must hit real data to prove the assertion
FORBIDDEN names a counterfeit success	Safety Test at the SAME layer — negative test case

The naming convention routes Story Contract ARTIFACT paths to Nx targets:

Story Contract ARTIFACT	Nx Target	Layer
`*.schema.spec.ts`	`test-schema`	L1
`*.integration.spec.ts`	`test-integration`	L2
`*.spec.ts` (in e2e project)	`e2e`	L3

FORBIDDEN as Tests

Every FORBIDDEN column in the Story Contract becomes a negative test at the same layer. Example: if the Story Contract says "FORBIDDEN: form submits with invalid data because client validation is bypassed," the Safety Test is:

// L3 E2E — tests the server-side rejection when client validation is bypassed
test("server rejects invalid input even without client validation", async ({ page }) => {
  await page.goto("/rfp/ventures/new");

  // Bypass client-side validation by removing the attribute
  await page.evaluate(() => {
    document.querySelector("form")?.setAttribute("novalidate", "");
  });

  await page.getByLabel("Name").fill(""); // Empty — should fail server-side
  await page.getByRole("button", { name: /create/i }).click();

  // Server must reject — not redirect to success
  await expect(page.getByRole("status")).toContainText(/required/i);
  await expect(page).not.toHaveURL(/\/rfp\/ventures\/[a-f0-9-]+$/);
});

This catches the case where TanStack Form validators are dead code because HTML noValidate wasn't set, and the server action's validateOrError() is the real gate. The FORBIDDEN column makes this test mandatory, not optional.

Thin Wrappers

The architectural prerequisite that makes testing cheap. In the existing codebase vocabulary: server actions are thin wrappers that delegate to the composition root.

Server Action (thin wrapper)          Service via Composition Root
┌─────────────────────┐              ┌──────────────────────────┐
│ validateOrError()    │──────────→  │ createAppServices()      │
│ assertPermission()   │              │ useCase.execute()        │
│ revalidateTag()      │←──────────  │ return typed result      │
└─────────────────────┘              └──────────────────────────┘

The action validates input (Zod), checks auth (assertPermission), delegates to the composition root (createAppServices), and handles cache revalidation. Business logic lives in use cases behind the composition root — testable at L2 with mocked services, no Next.js internals needed.

Industry term: "Thin Action / Fat Service" pattern. Same concept, different vocabulary.

Without this pattern, every test needs to mock next/headers, next/cache, next/navigation. The mocking cost compounds. With it, use cases are plain TypeScript.

Nx-Native CI

The L1 → L2 → L3 cascade is enforced by Nx's task graph, not CI YAML job chains.

// nx.json targetDefaults
{
  "test-integration": {
    "dependsOn": ["test-schema"]
  },
  "e2e": {
    "dependsOn": ["test-integration"]
  }
}

CI runs one command:

nx affected -t test-schema test-integration e2e --base=main

Nx handles ordering via dependsOn. Only tests affected by the changeset run. Change a venture action? Only venture schema tests, venture integration tests, and the venture-scoped E2E run — not the whole suite.

Not: separate CI jobs with needs: chains. Not: run-many across every project. Nx's affected graph is the orchestrator.

The Hexagonal Advantage

When server actions are pure TypeScript functions behind ports, most logic is testable without React or a browser:

┌──────────────────────────────────┐
│        PRESENTATION (L3)         │  ← E2E only for browser-dependent flows
│  ┌──────────────────────────┐    │
│  │     APPLICATION (L2)     │    │  ← Integration: server actions, use cases
│  │  ┌──────────────────┐    │    │
│  │  │ INFRASTRUCTURE   │    │    │  ← Integration: repositories, adapters
│  │  │  ┌──────────┐    │    │    │
│  │  │  │  DOMAIN  │    │    │    │  ← Unit: pure transforms, validators
│  │  │  └──────────┘    │    │    │
│  │  └──────────────────┘    │    │
│  └──────────────────────────┘    │
└──────────────────────────────────┘

Layer	Test Strategy	Why
Domain	L1 Unit — no mocks needed	Pure functions, Zod schemas, DTOs. Zero dependencies.
Infrastructure	L2 Integration — real database	Repositories implement domain ports. Prove data round-trips.
Application	L2 Integration — mocked composition root	Server actions orchestrate use cases. Prove the composition.
Presentation	L3 E2E — only for browser interactions	Components consume application layer. Most rendering is server-side.

This maps directly to the layer model. Domain types are the source of truth. Tests prove each layer honors the contracts.

Quick Reference

When you're staring at a change and wondering "what layer does this go in?"

I changed...	Test layer	File pattern
A Zod schema	L1	`*.schema.spec.ts`
A pure function / DTO mapping	L1	`*.schema.spec.ts`
A server action	L2	`*.integration.spec.ts`
A repository / adapter	L2	`*.integration.spec.ts`
An API route	L2	`*.integration.spec.ts` (via NTARH)
A client component with state	L2	`*.browser.test.tsx` (Vitest Browser Mode)
A form submission flow	L3	`*.spec.ts` (Playwright)
Auth / OAuth redirect	L3	`*.spec.ts` (Playwright)
Layout / responsive behavior	L3	`*.spec.ts` (Playwright)
A TypeScript type or interface	L0	No test file — `tsc --noEmit` covers it

Recovery Backlog

Starting from lost ground? Prioritized list:

Wire Nx targets — Add test-schema and test-integration to targetDefaults in nx.json
Shared test helpers — createMockAppServices(), createMockAuthContext() in a testing lib
Top 3 actions — Write L1 + L2 for the three most critical server actions
Security audit — Every "use server" has validateOrError() and assertPermission()
CI cascade — Replace CI YAML job chains with nx affected -t test-schema test-integration e2e
Generator scaffolding — Nx generator emits action + schema test + integration test
E2E thinning — Audit existing E2E: can the core claim be proven at L2?
Vitest migration — Replace Jest with Vitest (@nx/vite:test executor)
E2E splitting — Split monolithic E2E into feature-scoped projects with implicitDependencies

Future direction: @epic-web/app-launcher — per-test isolated server instances. Eliminates the mock boundary entirely. Document as direction, not current capability.

Patterns

L0: Types as Tests

The cheapest verification. Strict TypeScript catches broken refactors, wrong argument types, missing fields, and import errors before any test runs.

tsconfig.json strict mode flags:
  strict: true
  noUncheckedIndexedAccess: true
  noUnusedLocals: true
  exactOptionalPropertyTypes: true

When a domain contract changes, the compiler lights up every file that needs updating — outward through infrastructure, application, presentation. Red squiggles are breadcrumbs.

L1: Unit Tests

Pure functions. No database, no DOM, no HTTP. Under 100ms per test.

Good candidates: score calculations, DTO mappings, Zod schema validation, status logic, discount rules, formatting functions. Anything that takes data in and returns data out.

L2: Integration Tests

The bulk of a server-action architecture. Real database (or mocked composition root — both valid at L2), real queries, real data. Prove that the composition of domain + infrastructure + application produces correct results.

Rules:

Real database or mocked composition root — not mocked repositories
Clean up test data after each test — no shared state
Test isolation — each test is independent

Server actions marked "use server" are async functions. In a test runner, the directive is a bundler instruction, not a runtime constraint. Mock next/headers, next/navigation, next/cache — test the function directly.

L3: E2E Tests

Last resort. Browser-dependent flows only. Expensive but irreplaceable for:

Multi-step form interactions (fill, submit, redirect, error states)
Authentication flows (OAuth redirects, session management)
Responsive layout behavior across breakpoints
Keyboard navigation and accessibility
Client-side state that only exists in the browser
Wiring confirmation — TanStack Form → server action → redirect

Selectors: Use data-testid attributes, not CSS selectors or XPath. Semantic selectors survive redesigns.

Timeouts: Use waitForSelector(), never arbitrary sleep(). Flaky tests are worse than no tests.

Playwright POM: Use Page Object Model for E2E. Feature specs, not function specs — each POM represents a user journey, not a page.

Zod as Contracts

Zod schemas serve as runtime contract verification at every data boundary. The schema IS the test:

L0: tsc catches type mismatches at compile time
L1: Zod catches shape/constraint violations at runtime
L2: Integration tests verify business logic with validated data
L3: E2E verifies the full flow

When schemas define both the TypeScript type (z.infer<typeof Schema>) and the runtime validator, schema drift becomes impossible. The type IS the validator. This is the type boundary made enforceable.

Generator Test Parity

Generated test files must pass the same gates as hand-written tests. A generator that produces specs which fail pre-commit hooks is broken — regardless of whether the tests pass when run directly.

Generator State	Pre-commit Result	Maturity
Spec passes lint-staged	Clean	Generator tier — enforcement is free
Spec fails lint-staged	Hook rejection	Expertise tier — every scaffold needs manual cleanup

The Nx generator scaffolds action + schema test + integration test together. It uses @nx/devkit patterns: generateFiles for templates, updateProjectConfiguration to wire test-schema and test-integration targets into project.json.

Rebalancing

Don't delete E2E specs. Write cheaper tests for the same logic, then remove the redundant E2E coverage.

The audit question for each existing E2E test: Can the core claim be proven without a browser?

Likely outcome for a server-action-heavy app:

~5% remain E2E — auth flows, form wiring, layout
~80% become integration — server actions are the bulk of logic
~10% become unit — pure transforms extracted from actions
~5% become intent — API contract validation

Context

Testing Platform — Trophy strategy, economics, Nx target structure, Story Contract connection
Vitest — Primary runner: setup, examples, MSW, browser mode
SPEC-MAP — Story Contract rows become test files
Type-First Development — Types drive test specs, compiler as methodology
Flow Engineering — Maps produce domain contracts that become test expectations
Architecture — Hexagonal patterns that make testing cheap
Testing Tools — Vitest, Playwright, RTL, MSW
Dev Workflow — Build stream and fix stream
Code That Lasts — Testing the contract: evidence only, pure core, the tier model in full

Questions

If the cheapest test that proves correctness is a type check, why do most teams start with the most expensive one?

When a server action is a pure function behind a composition root, what justifies testing it through a browser instead of a function call?
If the FORBIDDEN column drives Safety Tests, what happens to test coverage when Story Contract rows skip the FORBIDDEN column?
At what point does generator test parity — generated specs passing the same hooks as hand-written tests — eliminate the need for code review of generated output?
If mocking the database hides the bugs that matter, what category of bugs does mocking the browser hide?

The Layer Model​

Selection Rule​

The Browser Gate​

Story to Layer​

FORBIDDEN as Tests​

Thin Wrappers​

Nx-Native CI​

The Hexagonal Advantage​

Quick Reference​

Recovery Backlog​

Patterns​

L0: Types as Tests​

L1: Unit Tests​

L2: Integration Tests​

L3: E2E Tests​

Zod as Contracts​

Generator Test Parity​

Rebalancing​

Context​

Questions​