Testing Infrastructure
Where do your tests run — and what does that cost?
MERGE LOOP (every PR — fast, blocks merge)
COMMIT → INTEGRATION → PREVIEW DEPLOY → E2E → MERGE
│ │ │ │
▼ ▼ ▼ ▼
Local Real DB Real URL Real browser
Free 5-30s Free tier 30-120s
HEALTH LOOP (scheduled/post-merge — thorough, never blocks PRs)
FULL TYPECHECK → ARCHITECTURE AUDIT → CROSS-PROJECT TESTS → REPORT
│ │ │ │
▼ ▼ ▼ ▼
All projects Layer violations Dependency drift Dashboard
Minutes Static analysis Integration Alerting
Testing infrastructure is the machinery that runs tests — CI pipelines, databases, runners, preview environments. Get this wrong and you pay in flaky tests, blocked terminals, or expensive CI bills. Get it right and every push gets verified automatically against a real deployment.
The Core Principle
Test against deployed artifacts, not dev servers.
Running E2E tests against localhost means port conflicts, RAM contention, and the false choice between "dev server running" or "run tests." Preview deploys eliminate this entire class of problems. A real URL. Real infrastructure. No local resource cost.
Pipeline Design
Two pipelines. Different triggers, different budgets.
Local Pipeline (every commit)
Fast, free, catches 90% of issues.
tsc --noEmit → vitest run → done
│ │
Types OK? Unit + Integration pass?
<10s <60s
Budget: Under 90 seconds total. If it exceeds this, tests will be skipped. Set a hard ceiling and enforce it.
Requirements:
- Test database running (Docker container on a dedicated port)
- No dev server needed — unit and integration tests don't use the browser
- Memory-safe typecheck — use affected-file checking, not full project
| Command | Purpose | Trap to Avoid |
|---|---|---|
tsc --noEmit or pnpm tc | Type verification | Full project typecheck can OOM on large monorepos. Use affected-only. |
vitest run or jest --bail | Unit + integration | --bail stops on first failure. Fast feedback. |
CI Pipeline (every PR)
Thorough. Runs against the preview deployment. Catches integration issues.
Install → Typecheck → Unit/Integration → Wait for Preview → E2E → Report
│
Vercel builds preview URL
from the PR branch
The key insight: Vercel gives you a preview URL on every push for free. The E2E tests just need to point at that URL instead of localhost.
Two Loops
Merge decisions and repo health are different questions. Answering both in one workflow makes PRs slow and health checks shallow.
| Loop | Question | Trigger | Speed |
|---|---|---|---|
| Merge loop | Is this change safe to merge? | PR push | Fast (minutes) |
| Health loop | Is the monorepo internally consistent? | Post-merge, scheduled, manual | Thorough (can be slow) |
Merge loop runs on every PR. It gates the merge button. Everything in it must finish fast enough that developers don't context-switch while waiting. Branch protection should only require checks that run here.
Health loop runs on a schedule or after merge. It catches cross-project drift, full-monorepo type errors, and dependency staleness. These checks matter — but they don't belong in the PR path because they're slow and their failures rarely trace to the current change.
If a workflow tries to answer both questions, it will be too slow for PRs and too narrow for health.
Cron schedules in CI run UTC. Document the conversion for your timezone. A "nightly" job at 0 0 * * * runs at noon in NZST.
Signal Hierarchy
Not all test signals are equal. The hierarchy from highest to lowest value:
| Rank | Signal | What it proves | Loop |
|---|---|---|---|
| 1 | E2E against deployed preview | The app works for real users | Merge |
| 2 | Integration tests against real DB | Data contracts hold | Merge |
| 3 | Structural/architecture checks | Repo rules aren't violated | Merge |
| 4 | Unit tests (affected) | Logic transforms are correct | Merge |
| 5 | Full monorepo typecheck | Cross-project type safety | Health |
Test what the user experiences first. Test what the compiler sees last.
A full monorepo typecheck is valuable — but it catches type errors across projects that weren't touched by the PR. That's a health concern, not a merge concern. Running it on every PR burns minutes and blocks merges on failures the author can't fix.
Preview Deploy Testing
The single highest-leverage infrastructure change. Decouples test environment from development environment.
How It Works
git push → Vercel builds preview → CI runs Playwright against preview URL
│ │
Free on all plans Tests hit real infrastructure
Unique URL per commit No local server needed
Playwright Configuration
One config change makes Playwright work against any URL:
// playwright.config.ts
export default defineConfig({
use: {
baseURL: process.env.BASE_URL || "http://localhost:3000",
},
// Only start local server when NOT in CI
webServer: process.env.CI
? undefined
: {
command: "pnpm dev",
url: "http://localhost:3000",
reuseExistingServer: true,
},
});
Locally: Playwright starts the dev server. In CI: BASE_URL points to the Vercel preview. Same tests, different target.
Deployment Protection
If Vercel has deployment protection enabled, Playwright requests hit an auth page instead of the app. Solutions:
| Approach | Cost | Trade-off |
|---|---|---|
| Disable protection for previews | Free | Previews are publicly accessible |
| Automation bypass header | Pro plan ($20/mo) | Previews stay protected, CI gets a bypass token |
| Password protection with env var | Pro plan | Playwright sends the password in a setup step |
For most teams, disabling protection on preview deployments is fine — they're ephemeral and the URLs are unguessable.
Test Database Infrastructure
Integration tests (L2) need a real database. Not a mock. Mocking the database hides the bugs that matter — constraint violations, transaction behavior, query performance.
Docker Test Database
Dedicated container, dedicated port, isolated from development:
| Concern | Approach |
|---|---|
| Isolation | Separate port (e.g., 5433) from dev database |
| Startup | docker compose up with health check, wait for ready signal |
| Cleanup | Each test cleans its own data. No shared state between tests. |
| Reset | Drop and recreate between test suites if needed |
| CI | Service container in GitHub Actions / self-hosted runner |
Port Allocation
Dedicated ports prevent collisions between dev and test:
| Service | Dev Port | Test Port |
|---|---|---|
| Database | 5432 | 5433 |
| App server | 3000 | 4300 |
| API | 3001 | — (use preview URL) |
CI Design Principles
Four rules govern every CI decision. Break one and costs compound silently.
Unique Signal
Every CI step must catch something that nothing else catches. If two steps catch the same error, one is redundant. Remove it.
| Step | Unique signal | Why only here |
|---|---|---|
| tsconfig validation | rootDir (TS6059), missing spec excludes | Can't build locally. Vercel fails late. |
| Architecture audit | Hexagonal layer violations | Static analysis, not part of TS or build |
| Typecheck (affected) | Cross-project TS errors | Vercel skips via ignoreBuildErrors: true |
| Test (affected, unit) | Logic regressions, value transforms | Not caught by typecheck or build |
Before adding any step, answer: "What does this catch that nothing else catches?" If you can't name it, don't add it.
One Workflow, One Install
Every pnpm install in CI costs 2-3 minutes. Multiple workflows that each install dependencies multiply this waste. Combine steps into one workflow with sequential jobs. The structural check (no install needed) runs first and fast-fails before the expensive install job.
pr-quality-gate.yml
Job 1: structural (~30s, no install)
- validate-tsconfig-lib.sh
Job 2: quality-gate (~10 min, one install)
- pnpm install
- architecture:audit
- nx affected --target=typecheck
- nx affected --target=test
Skip What Doesn't Matter
Not every push needs CI. Three filters that eliminate waste:
| Filter | What it skips | Mechanism |
|---|---|---|
paths-ignore | Docs-only changes (.md, .claude/, docs/) | Workflow trigger config |
| Draft PR check | Work-in-progress PRs | if: github.event.pull_request.draft == false |
cancel-in-progress | Superseded runs from rapid pushes | Concurrency group with cancellation |
Separate Loops
Each workflow answers one question. The merge loop answers "is this PR safe?" The health loop answers "is the monorepo consistent?" Combining them into one workflow makes PRs slow and health checks incomplete. See Two Loops for the full principle.
Cost Controls
CI costs are invisible until they aren't. The free tier is generous — until you exceed it and every push costs real money. Treat CI minutes like a budget, not an unlimited resource.
Budget Math
The formula:
monthly_minutes = minutes_per_run × pushes_per_day × working_days
Worked example (GitHub Actions free tier = 2,000 min/month):
| Variable | Value | Source |
|---|---|---|
| Minutes per run | 10 min | Measure from CI logs |
| Pushes per day | 5 | Average over 2 weeks |
| Working days | 22 | Exclude weekends |
| Monthly total | 1,100 min | 55% of free tier |
Run this calculation after every CI change. A step that adds 3 minutes per run adds 3 × 5 × 22 = 330 min/month — that's 16% of the free tier from one step.
Alert Thresholds
Set three thresholds. Check monthly on the GitHub Actions billing page.
| Usage | Action |
|---|---|
| > 75% of budget | Review push frequency — are draft PRs or docs-only commits triggering CI? |
| > 85% of budget | Increase paths-ignore scope or reduce parallelism |
| > 95% of budget | Escalate — defer new CI steps until headroom returns |
Reduce What Runs
The testing strategy page covers the layer model. The infrastructure implication: every E2E test you convert to an integration test saves 30-120 seconds of CI browser time. At scale, this is the biggest cost lever.
| Action | Savings |
|---|---|
| Convert server-action E2E → integration test | ~60s per test |
Run only affected tests (nx affected) | Skip unchanged projects entirely |
| Bail on first failure in PRs | Stop wasting time after a break |
| Parallelize with sharding | Same wall-clock time, more runner-minutes, but faster feedback |
Cheaper Runners
GitHub Actions minutes are premium-priced. Alternatives for browser-heavy workloads:
| Provider | Model | Relative Cost |
|---|---|---|
| GitHub Actions | Per-minute, managed | Baseline (1x) |
| Self-hosted runner | Your machine, free compute | Free (your electricity + RAM) |
| Ubicloud | Drop-in replacement, bare metal | ~0.3x |
| RunsOn | Your AWS account, spot instances | ~0.1-0.15x |
| Blacksmith | Managed, faster builds | ~0.5x |
Cheapest viable path: Self-hosted runner on a spare machine for private repos. Ubicloud for open source.
Simplest path: Run E2E locally against preview URLs. Zero CI minutes for browser tests. The preview URL is the infrastructure — BASE_URL=https://your-preview.vercel.app npx playwright test.
NX and Caching
Monorepo build tools provide two cost-saving mechanisms:
| Mechanism | What It Does |
|---|---|
| Affected commands | Only test projects changed by the PR. Skip everything else. |
| Computation caching | If inputs haven't changed, reuse previous test results. |
In CI: nx affected -t test instead of nx run-many -t test. On a 10-project monorepo, this typically skips 60-80% of test runs.
Cross-run caching stores .nx/cache as a GitHub Actions artifact, keyed on nx.json hash + commit SHA. Restores from prefix match on cache miss — partial hits still save time.
Phase-Based Rollout
Add CI steps in phases. Start with what proves the most — real infrastructure, real users. Each phase must prove stability before the next ships.
| Phase | What | Loop | Prerequisite |
|---|---|---|---|
| 1. Integration | Real database service container | Merge | Workflow exists, budget under 60% |
| 2. E2E | Playwright against preview URLs | Merge | Phase 1 green rate > 95% over 30 days, budget under 70% |
| 3. Structural | Architecture audit + affected typecheck | Merge | Phase 2 stable, budget under 80% |
| 4. Full typecheck | Cross-project type safety | Health | Phase 3 stable, scheduled workflow configured |
Integration and E2E come first because they catch the bugs that reach users. Structural checks and full typecheck are guardrails — valuable, but secondary to proving the app works.
Never add the next phase while the current one is flaky. Flaky CI erodes trust faster than no CI at all.
Preflight Checklist
Before any test work:
- Test database running and healthy
- Correct environment identified (local vs CI vs preview)
- Known issues checked (issue log, failing tests)
- Test type identified → correct layer selected
Never Do
| Action | Why | Alternative |
|---|---|---|
| Full-project typecheck on large monorepos | OOM crash | Affected-only typecheck |
npm run dev in CI | Blocks the runner | Build + serve, or use preview URL |
| Skip database readiness check | Tests fail silently with connection errors | Always wait for ready signal |
| Claim tests pass without running them | Breaks trust | Evidence or nothing |
Arbitrary sleep() in E2E tests | Flaky, slow | waitForSelector(), waitForResponse() |
Feedback Loop
Tests are instruments. They measure the gap between intent and reality.
TEST FAILURE
│
├─► Implementation gap → fix the code (engineering concern)
│
└─► Vision gap → update the spec (product concern)
Every failure is classified: is the code wrong, or is the spec wrong? This distinction routes feedback to the right team and prevents the cycle of fixing code to match a broken spec.
Log failures with evidence: test file, line number, expected vs actual. Structured feedback compounds — pattern recognition across failures reveals systemic issues that individual fixes miss.
Beyond Code: Intent Verification
The same CI principles apply when AI agents execute business operations. Tests verify code correctness. Intent verification verifies agent correctness — does the agent's action match the human's intent?
| Code CI | Agent CI |
|---|---|
| Types match spec | Action matches intent record |
| Integration tests pass | Policy constraints satisfied |
| E2E proves user journey | Full chain: auth → intent → action → settlement → audit |
| Preview deploy = real infra | Verifiable Intent = real cryptographic proof |
As work charts show AI handling 65-85% of business operations, intent verification becomes the quality gate that prevents misaligned automated actions at scale. See Trust Architecture for the full pattern.
Context
- Testing Strategy — Layer model, selection rules, hexagonal testing
- Testing Economics — Cost-benefit model per test layer
- Testing Tools — Vitest, Jest, Playwright, RTL
- Monorepo Build Tools — NX affected commands, caching
- Dev Environment — Docker, containers, isolation
- Deployment Checklist — What happens after tests pass
- DevOps — CI/CD, security, git practices
- CI Strategy Audit — Gap analysis and engineering task spec