Testing Strategy
What is the cheapest test that proves your change works?
This page defines the layer model and selection rules. For the trophy strategy, economics, and Nx target structure, see the Testing Platform. For Vitest setup and examples, see Vitest.
L0 TYPES ← L1 UNIT ← L2 INTEGRATION ← L3 E2E
│ │ │ │
▼ ▼ ▼ ▼
<1s <1s 5-30s 30-120s
Compiler Pure logic Real database Real browser
Free Cheap Moderate Expensive
Most codebases are inverted — 80% E2E, 5% unit. Every browser test that proves something a function call can prove is waste. The browser is a last resort, not a starting point.
The Layer Model
Four layers. Each proves something the layer below cannot. Stop at the first layer that covers the change.
| Layer | What It Proves | Tool | Cost |
|---|---|---|---|
| L0 Types | Compiler accepts it — contracts match, imports resolve, refactors propagate | tsc --noEmit | <1s |
| L1 Unit | Pure transform works — input A produces output B, no I/O | Vitest (target: Jest migration) | <1s per test |
| L2 Integration | Data layer works end-to-end — server action + repository produce correct data | Vitest + real DB (target: Jest current) | 5-30s per test |
| L3 E2E | Human can complete the journey — multi-step browser interaction works | Playwright | 30-120s per test |
An optional Intent layer sits between Integration and E2E for API contract validation (A2A protocol, webhook shapes). Most teams don't need it until they have agent-to-agent communication.
Selection Rule
Stop at the first match, top to bottom:
| Code Changed | Layer |
|---|---|
| Pure function, Zod schema, DTO mapping, domain logic | L1 Unit |
| Server action, repository, adapter, composition root | L2 Integration |
| API contract, agent protocol, webhook payload | Intent |
| UI journey requiring browser interaction, layout, a11y | L3 E2E |
L0 runs on every change regardless. TypeScript is free verification — a broken refactor that renames a field lights up every consumer instantly. This is why type-first development matters: the compiler is your most cost-effective test suite.
The Browser Gate
Before writing any E2E test, ask: can this test's core claim be proven without a browser?
If the answer is yes, write the cheaper test. A server action that creates a user and returns a result is an L2 integration test. You only need L3 when the browser itself is part of the proof — form interaction, navigation flow, responsive layout, accessibility.
The decomposition test: If an E2E test is proving more than one claim, it should be split. Each claim gets tested at its cheapest viable layer.
| Combined E2E claim | Decomposed |
|---|---|
| "Form renders AND action creates entity AND redirect works" | L2: action returns valid ID. L3: form visible, button clicks, redirect lands. |
| "Validation rejects bad input AND shows error message" | L1: schema rejects invalid input. L3: error message visible after submit. |
| "Auth blocks access AND redirects to login" | L2: action throws without auth. L3: page redirects to login URL. |
The cost signal: If you spend more than 30 minutes debugging an E2E test, the test is probably at the wrong layer. E2E tests should be thin wiring checks that pass on first run if the underlying logic works. When they don't, the debugging belongs at L2 where the feedback loop is seconds, not minutes.
Story to Layer
Story Contract rows drive test layer selection. The SPEC-MAP bridges intent (what to prove) to implementation (how to prove it).
| Story Contract Column | Layer Decision |
|---|---|
Test Type = unit | L1 — schema or pure function test |
Test Type = integration | L2 — server action with mocked composition root |
Test Type = e2e | L3 — browser wiring only, logic already proven at L2 |
| THEN names a data source + threshold | L2 minimum — must hit real data to prove the assertion |
| FORBIDDEN names a counterfeit success | Safety Test at the SAME layer — negative test case |
The naming convention routes Story Contract ARTIFACT paths to Nx targets:
| Story Contract ARTIFACT | Nx Target | Layer |
|---|---|---|
*.schema.spec.ts | test-schema | L1 |
*.integration.spec.ts | test-integration | L2 |
*.spec.ts (in e2e project) | e2e | L3 |
FORBIDDEN as Tests
Every FORBIDDEN column in the Story Contract becomes a negative test at the same layer. Example: if the Story Contract says "FORBIDDEN: form submits with invalid data because client validation is bypassed," the Safety Test is:
// L3 E2E — tests the server-side rejection when client validation is bypassed
test('server rejects invalid input even without client validation', async ({ page }) => {
await page.goto('/rfp/ventures/new');
// Bypass client-side validation by removing the attribute
await page.evaluate(() => {
document.querySelector('form')?.setAttribute('novalidate', '');
});
await page.getByLabel('Name').fill(''); // Empty — should fail server-side
await page.getByRole('button', { name: /create/i }).click();
// Server must reject — not redirect to success
await expect(page.getByRole('status')).toContainText(/required/i);
await expect(page).not.toHaveURL(/\/rfp\/ventures\/[a-f0-9-]+$/);
});
This catches the case where TanStack Form validators are dead code because HTML noValidate wasn't set, and the server action's validateOrError() is the real gate. The FORBIDDEN column makes this test mandatory, not optional.
Thin Wrappers
The architectural prerequisite that makes testing cheap. In the existing codebase vocabulary: server actions are thin wrappers that delegate to the composition root.
Server Action (thin wrapper) Service via Composition Root
┌─────────────────────┐ ┌──────────────────────────┐
│ validateOrError() │──────────→ │ createAppServices() │
│ assertPermission() │ │ useCase.execute() │
│ revalidateTag() │←────────── │ return typed result │
└─────────────────────┘ └──────────────────────────┘
The action validates input (Zod), checks auth (assertPermission), delegates to the composition root (createAppServices), and handles cache revalidation. Business logic lives in use cases behind the composition root — testable at L2 with mocked services, no Next.js internals needed.
Industry term: "Thin Action / Fat Service" pattern. Same concept, different vocabulary.
Without this pattern, every test needs to mock next/headers, next/cache, next/navigation. The mocking cost compounds. With it, use cases are plain TypeScript.
Nx-Native CI
The L1 → L2 → L3 cascade is enforced by Nx's task graph, not CI YAML job chains.
// nx.json targetDefaults
{
"test-integration": {
"dependsOn": ["test-schema"]
},
"e2e": {
"dependsOn": ["test-integration"]
}
}
CI runs one command:
nx affected -t test-schema test-integration e2e --base=main
Nx handles ordering via dependsOn. Only tests affected by the changeset run. Change a venture action? Only venture schema tests, venture integration tests, and the venture-scoped E2E run — not the whole suite.
Not: separate CI jobs with needs: chains. Not: run-many across every project. Nx's affected graph is the orchestrator.
The Hexagonal Advantage
When server actions are pure TypeScript functions behind ports, most logic is testable without React or a browser:
┌──────────────────────────────────┐
│ PRESENTATION (L3) │ ← E2E only for browser-dependent flows
│ ┌──────────────────────────┐ │
│ │ APPLICATION (L2) │ │ ← Integration: server actions, use cases
│ │ ┌──────────────────┐ │ │
│ │ │ INFRASTRUCTURE │ │ │ ← Integration: repositories, adapters
│ │ │ ┌──────────┐ │ │ │
│ │ │ │ DOMAIN │ │ │ │ ← Unit: pure transforms, validators
│ │ │ └──────────┘ │ │ │
│ │ └──────── ──────────┘ │ │
│ └──────────────────────────┘ │
└──────────────────────────────────┘
| Layer | Test Strategy | Why |
|---|---|---|
| Domain | L1 Unit — no mocks needed | Pure functions, Zod schemas, DTOs. Zero dependencies. |
| Infrastructure | L2 Integration — real database | Repositories implement domain ports. Prove data round-trips. |
| Application | L2 Integration — mocked composition root | Server actions orchestrate use cases. Prove the composition. |
| Presentation | L3 E2E — only for browser interactions | Components consume application layer. Most rendering is server-side. |
This maps directly to the layer model. Domain types are the source of truth. Tests prove each layer honors the contracts.
Quick Reference
When you're staring at a change and wondering "what layer does this go in?"
| I changed... | Test layer | File pattern |
|---|---|---|
| A Zod schema | L1 | *.schema.spec.ts |
| A pure function / DTO mapping | L1 | *.schema.spec.ts |
| A server action | L2 | *.integration.spec.ts |
| A repository / adapter | L2 | *.integration.spec.ts |
| An API route | L2 | *.integration.spec.ts (via NTARH) |
| A client component with state | L2 | *.browser.test.tsx (Vitest Browser Mode) |
| A form submission flow | L3 | *.spec.ts (Playwright) |
| Auth / OAuth redirect | L3 | *.spec.ts (Playwright) |
| Layout / responsive behavior | L3 | *.spec.ts (Playwright) |
| A TypeScript type or interface | L0 | No test file — tsc --noEmit covers it |
Recovery Backlog
Starting from lost ground? Prioritized list:
- Wire Nx targets — Add
test-schemaandtest-integrationtotargetDefaultsinnx.json - Shared test helpers —
createMockAppServices(),createMockAuthContext()in a testing lib - Top 3 actions — Write L1 + L2 for the three most critical server actions
- Security audit — Every
"use server"hasvalidateOrError()andassertPermission() - CI cascade — Replace CI YAML job chains with
nx affected -t test-schema test-integration e2e - Generator scaffolding — Nx generator emits action + schema test + integration test
- E2E thinning — Audit existing E2E: can the core claim be proven at L2?
- Vitest migration — Replace Jest with Vitest (
@nx/vite:testexecutor) - E2E splitting — Split monolithic E2E into feature-scoped projects with
implicitDependencies
Future direction: @epic-web/app-launcher — per-test isolated server instances. Eliminates the mock boundary entirely. Document as direction, not current capability.
Patterns
L0: Types as Tests
The cheapest verification. Strict TypeScript catches broken refactors, wrong argument types, missing fields, and import errors before any test runs.
tsconfig.json strict mode flags:
strict: true
noUncheckedIndexedAccess: true
noUnusedLocals: true
exactOptionalPropertyTypes: true
When a domain contract changes, the compiler lights up every file that needs updating — outward through infrastructure, application, presentation. Red squiggles are breadcrumbs.
L1: Unit Tests
Pure functions. No database, no DOM, no HTTP. Under 100ms per test.
Good candidates: score calculations, DTO mappings, Zod schema validation, status logic, discount rules, formatting functions. Anything that takes data in and returns data out.
L2: Integration Tests
The bulk of a server-action architecture. Real database (or mocked composition root — both valid at L2), real queries, real data. Prove that the composition of domain + infrastructure + application produces correct results.
Rules:
- Real database or mocked composition root — not mocked repositories
- Clean up test data after each test — no shared state
- Test isolation — each test is independent
Server actions marked "use server" are async functions. In a test runner, the directive is a bundler instruction, not a runtime constraint. Mock next/headers, next/navigation, next/cache — test the function directly.
L3: E2E Tests
Last resort. Browser-dependent flows only. Expensive but irreplaceable for:
- Multi-step form interactions (fill, submit, redirect, error states)
- Authentication flows (OAuth redirects, session management)
- Responsive layout behavior across breakpoints
- Keyboard navigation and accessibility
- Client-side state that only exists in the browser
- Wiring confirmation — TanStack Form → server action → redirect
Selectors: Use data-testid attributes, not CSS selectors or XPath. Semantic selectors survive redesigns.
Timeouts: Use waitForSelector(), never arbitrary sleep(). Flaky tests are worse than no tests.
Playwright POM: Use Page Object Model for E2E. Feature specs, not function specs — each POM represents a user journey, not a page.
Zod as Contracts
Zod schemas serve as runtime contract verification at every data boundary. The schema IS the test:
L0: tsc catches type mismatches at compile time
L1: Zod catches shape/constraint violations at runtime
L2: Integration tests verify business logic with validated data
L3: E2E verifies the full flow
When schemas define both the TypeScript type (z.infer<typeof Schema>) and the runtime validator, schema drift becomes impossible. The type IS the validator. This is the type boundary made enforceable.
Generator Test Parity
Generated test files must pass the same gates as hand-written tests. A generator that produces specs which fail pre-commit hooks is broken — regardless of whether the tests pass when run directly.
| Generator State | Pre-commit Result | Maturity |
|---|---|---|
| Spec passes lint-staged | Clean | Generator tier — enforcement is free |
| Spec fails lint-staged | Hook rejection | Expertise tier — every scaffold needs manual cleanup |
The Nx generator scaffolds action + schema test + integration test together. It uses @nx/devkit patterns: generateFiles for templates, updateProjectConfiguration to wire test-schema and test-integration targets into project.json.
Rebalancing
Don't delete E2E specs. Write cheaper tests for the same logic, then remove the redundant E2E coverage.
The audit question for each existing E2E test: Can the core claim be proven without a browser?
Likely outcome for a server-action-heavy app:
- ~5% remain E2E — auth flows, form wiring, layout
- ~80% become integration — server actions are the bulk of logic
- ~10% become unit — pure transforms extracted from actions
- ~5% become intent — API contract validation
Context
- Testing Platform — Trophy strategy, economics, Nx target structure, Story Contract connection
- Vitest — Primary runner: setup, examples, MSW, browser mode
- SPEC-MAP — Story Contract rows become test files
- Type-First Development — Types drive test specs, compiler as methodology
- Flow Engineering — Maps produce domain contracts that become test expectations
- Architecture — Hexagonal patterns that make testing cheap
- Testing Tools — Vitest, Playwright, RTL, MSW
- Dev Workflow — Build stream and fix stream
Questions
If the cheapest test that proves correctness is a type check, why do most teams start with the most expensive one?
- When a server action is a pure function behind a composition root, what justifies testing it through a browser instead of a function call?
- If the FORBIDDEN column drives Safety Tests, what happens to test coverage when Story Contract rows skip the FORBIDDEN column?
- At what point does generator test parity — generated specs passing the same hooks as hand-written tests — eliminate the need for code review of generated output?
- If mocking the database hides the bugs that matter, what category of bugs does mocking the browser hide?