Testing Tech
How do you prove the product delivers what you promised?
The Testing Trophy defines the strategy. These are the tools. Each layer proves something different — function at the bottom, value at the top.
L4 Commissioning → Agent-Browser (on-demand, production, proves VALUE)
L3 E2E → Playwright (CI-gated, test DB, proves FUNCTION)
L2 Integration → Vitest / Jest + real DB (proves WIRING)
L1 Unit → Vitest / Jest (proves LOGIC)
L0 Static → tsc --noEmit (proves CONTRACTS)
Two E2E Layers
L3 and L4 are both E2E but they prove different things.
| Dimension | L3 (Playwright) | L4 (Agent-Browser) |
|---|---|---|
| Proves | Code works — functions execute correctly | Value delivered — customer job gets done |
| Runs against | Test DB (port 5433), localhost | Production (dreamineering.com), real data |
| Triggered by | CI on every PR | Dream team on demand (agt-audit-commission-capability skill) |
| Owned by | Engineering | Dream team (commissioner is never the builder) |
| Speed | 30-120s per test | Minutes per route (interactive) |
| Evidence | GREEN/RED in CI | Screenshots, Story Contract verification |
| Maps to | SPEC-MAP Test Status column | SPEC-MAP L-Level + Last Verified columns |
L3 proves the plumbing works. L4 proves the customer gets what was promised. Both are necessary. Neither substitutes for the other.
Selection Guide
| Scenario | Tool | Why |
|---|---|---|
| PR merge gate | Playwright (L3) | Fast, repeatable, catches regressions |
| Story Contract verification | Agent-Browser (L4) | Follows user journey against real data |
| Route existence check | Agent-Browser (L4) | Navigate + screenshot in seconds |
| Form wiring proof | Playwright (L3) | Needs controlled test data |
| Happy path golden journey | Both | L3 proves it works, L4 proves it matters |
Agent-Browser Commands
: "Navigate and screenshot"
agent-browser open https://dreamineering.com/crm/contacts
agent-browser wait --load networkidle
agent-browser screenshot contacts.png
: "Interactive flow"
agent-browser snapshot -i # Get element refs
agent-browser fill @e1 "Acme" # Search
agent-browser click @e3 # Click result
agent-browser screenshot detail.png
: "Auth flow"
agent-browser open https://dreamineering.com/sign-in
agent-browser snapshot -i
agent-browser fill @e3 "$EMAIL"
agent-browser fill @e5 "$PASSWORD"
agent-browser click @e6
agent-browser wait --url "**/dashboard"
Tool Selection
| Layer | Primary Tool | Proves | Nx Target |
|---|---|---|---|
| L0 Static | TypeScript compiler | Contracts match | typecheck |
| L1 Unit | Vitest / Jest | Logic correct | test-schema |
| L2 Integration | Vitest / Jest + real DB | Wiring correct | test-integration |
| L2 Browser | Vitest Browser Mode | Client rendering correct | test-integration |
| L2 Mocking | MSW | Network interception shared | (used within L1/L2) |
| L3 E2E | Playwright | Function proven (CI) | e2e |
| L4 Commission | Agent-Browser | Value delivered (production) | manual |
Story Contract Flow
Every Story Contract row generates tests at multiple layers. The SPEC-MAP traces the chain.
Story Contract S1: "User searches contacts, finds match in <5s"
│
├── L1: contactSearchSchema.test.ts → Schema validates search input
├── L2: searchContacts.integration.ts → Server action returns correct results
├── L3: crm-contacts.spec.ts → Browser renders list, search filters
└── L4: agent-browser /crm/contacts → 29 contacts render, search works on production
Each branch in this tree becomes a row in the SPEC-MAP. The SPEC-MAP format requires Story#, WHEN/THEN, Test Layer, and Test Status columns — see the handoff protocol for the full schema and conversion rules.
The FORBIDDEN column drives safety tests at the same layer. A Story Contract row that forbids cross-org contact leakage sends proof to L2. The test must show multi-tenant isolation, not simple search success.
Story Contract Mapping
Every Story Contract row in a PRD maps to a trophy layer through the SPEC-MAP. Engineering fills the Test Layer column at the spec-to-tests bookend.
| Story Row Pattern | Test Layer | File Convention | Why This Layer |
|---|---|---|---|
| Schema validates input shape | L1 | *.schema.spec.ts | Pure logic, no DB needed |
| Server action returns correct data | L2 | *.integration.spec.ts | Proves wiring through real DB |
| Server action enforces multi-tenant isolation | L2 | *.integration.spec.ts | Safety test — must hit real data |
| Browser renders list from server action | L3 | *.spec.ts (Playwright) | Only if L2 passes and browser wiring is the unknown |
| Customer completes job on production | L4 | manual (agent-browser) | Value verification, not code verification |
The selection rule: Start at L1. Move up only when the layer below cannot prove the assertion. If a Story Contract THEN clause references data correctness, that's L2. If it references what the user sees, check whether an L2 test on the server action covers it first — most "user sees X" stories are L2 tests wearing L3 clothes.
SPEC-MAP enforcement: No empty Test Layer cells at L3+. If a story row has no test layer assigned, it's a spec-bounce — engineering flags it before building.
Failure Modes
- Treating L3 browser coverage as proof that the customer job works in production.
- Assigning L4 commissioning to the same builder who shipped the feature.
- Leaving FORBIDDEN outcomes without a same-layer safety test.
- Mapping data-correctness stories to browser tests when an L2 integration spec can prove the assertion faster.
- Updating SPEC-MAP rows without L-Level and Last Verified evidence from commissioning.
SPEC-MAP Conversion Protocol
Existing SPEC-MAPs may use the old | Tier | Feature | Spec | Status | format. Convert to the new trophy-aware format using this 4-step protocol.
Inventory L2 Specs
: "Find all integration specs for a domain"
ls libs/app-server/app-drmg-sales-server/src/actions/*<domain>*.integration.spec.ts
These specs already prove data correctness (CRUD, search, validation) at L2 — they're invisible to old-format SPEC-MAPs that only list L3 Playwright specs.
Apply Selection Rule
For each old SPEC-MAP row, ask: "Does an L2 integration spec already prove this feature?"
| Old Row Pattern | New Test Layer | Why |
|---|---|---|
| Feature is data CRUD (create, read, update, delete) | L2 | Integration spec hits real DB, proves wiring |
| Feature is search/filter | L2 | Server action returns correct results |
| Feature is "page renders with data" | L3 (keep) | Browser rendering is the unknown |
| Feature is "form interaction" | L3 (keep) | Multi-step UI flow needs browser |
| Feature is schema validation | L1 | Pure transform, no DB or browser needed |
Reclassify Duplicates
Many stories have BOTH an L2 and L3 spec. Keep both rows — they prove different things:
- L2 row proves the data is correct (server action → DB → response)
- L3 row proves the browser renders it (DataTable loads, form submits)
A story that only has L3 coverage and the L3 test checks data values (not browser rendering) should be pushed down to L2.
Browser-Only Proof
L3 stays when the assertion requires a real DOM:
- DataTable renders with search, sort, pagination
- Form multi-step interaction (fill → validate → submit → redirect)
- Navigation flow (breadcrumbs, back links, mobile nav)
- Responsive layout at breakpoints
- Keyboard navigation and accessibility
New Format Schema
See PRD Handoff Protocol — SPEC-MAP for the full column schema:
| Story# | WHEN/THEN | Test Layer | Test File | Test Status | L-Level | Last Verified |
Each tree branch from the Story Contract Flow above becomes one SPEC-MAP row. Engineering fills Test Layer (L1/L2/L3) and Test File. Dream fills Story#, L-Level, and Last Verified during commissioning.
CRM Example
Old format (3 rows, all L3):
| — | Contact list + search | crm-contacts.authenticated.spec.ts | UNMAPPED |
| — | Contact CRUD | crm-contacts-crud.authenticated.spec.ts | UNMAPPED |
| — | Contact edit flow | crm-contact-edit.authenticated.spec.ts | UNMAPPED |
New format (8 rows, 5 L2 + 3 L3):
| — | WHEN user creates contact ... THEN persists | L2 | crm-contacts.actions.integration.spec.ts | UNMAPPED |
| — | WHEN user retrieves contact ... THEN returned | L2 | crm-contacts.actions.integration.spec.ts | UNMAPPED |
| — | WHEN user updates contact ... THEN persists | L2 | crm-contacts.actions.integration.spec.ts | UNMAPPED |
| — | WHEN user deletes contact ... THEN soft-delete | L2 | crm-contacts.actions.integration.spec.ts | UNMAPPED |
| — | WHEN user searches contacts ... THEN filtered | L2 | crm-contacts.actions.integration.spec.ts | UNMAPPED |
| — | WHEN user opens contacts page THEN DataTable | L3 | crm-contacts.authenticated.spec.ts | UNMAPPED |
| — | WHEN user fills create form THEN submits | L3 | crm-contacts-crud.authenticated.spec.ts | UNMAPPED |
| — | WHEN user edits via form THEN values shown | L3 | crm-contact-edit.authenticated.spec.ts | UNMAPPED |
5 stories pushed from "invisible" to L2 visibility. 3 L3 stories retained for browser-specific proof.
Dig Deeper
- Vitest — Primary test runner: L1 unit, L2 integration, L2 browser mode. Nx setup, file naming conventions, MSW integration
- Jest — Current runner in the monorepo. Migration path to Vitest documented
- React Testing Library — Component testing that mirrors how users interact with UI
- MSW — Mock Service Worker: shared mocking language across L1, L2, and browser tests
Smart Contracts
Smart contract testing is a separate domain with different tools and economics.
Security Audit is vital — fast, best practice contract testing is extremely valuable.
Context
- Testing Platform — Trophy strategy, economics, Story Contract connection
- Testing Strategy — Layer model, selection rules, recovery backlog
- Dev Workflow — Build stream and fix stream
- SPEC-MAP — Traceability from Story Contract to test file to commissioning
- Validate Outcomes — L0-L4 maturity model
Questions
What does your test suite prove — that the code works, or that it delivers value?
- If L3 passes but L4 shows a 404, which layer is lying?
- When a Story Contract FORBIDDEN outcome has no test at any layer, how do you know it can't happen?
- If you had to choose between 100% L1 coverage or 5 L4 commissioning runs per week, which proves more?
- What's the cost of a test suite where every spec passes but the customer can't complete the job?