Skip to main content

Testing Infrastructure

Where do your tests run — and what does that cost?

MERGE LOOP (every PR — fast, blocks merge)
COMMIT → INTEGRATION → PREVIEW DEPLOY → E2E → MERGE
│ │ │ │
▼ ▼ ▼ ▼
Local Real DB Real URL Real browser
Free 5-30s Free tier 30-120s

HEALTH LOOP (scheduled/post-merge — thorough, never blocks PRs)
FULL TYPECHECK → ARCHITECTURE AUDIT → CROSS-PROJECT TESTS → REPORT
│ │ │ │
▼ ▼ ▼ ▼
All projects Layer violations Dependency drift Dashboard
Minutes Static analysis Integration Alerting

Testing infrastructure is the machinery that runs tests — CI pipelines, databases, runners, preview environments. Get this wrong and you pay in flaky tests, blocked terminals, or expensive CI bills. Get it right and every push gets verified automatically against a real deployment.

The Core Principle

Test against deployed artifacts, not dev servers.

Running E2E tests against localhost means port conflicts, RAM contention, and the false choice between "dev server running" or "run tests." Preview deploys eliminate this entire class of problems. A real URL. Real infrastructure. No local resource cost.

Pipeline Design

Two pipelines. Different triggers, different budgets.

Local Pipeline (every commit)

Fast, free, catches 90% of issues.

tsc --noEmit → vitest run → done
│ │
Types OK? Unit + Integration pass?
<10s <60s

Budget: Under 90 seconds total. If it exceeds this, tests will be skipped. Set a hard ceiling and enforce it.

Requirements:

  • Test database running (Docker container on a dedicated port)
  • No dev server needed — unit and integration tests don't use the browser
  • Memory-safe typecheck — use affected-file checking, not full project
CommandPurposeTrap to Avoid
tsc --noEmit or pnpm tcType verificationFull project typecheck can OOM on large monorepos. Use affected-only.
vitest run or jest --bailUnit + integration--bail stops on first failure. Fast feedback.

CI Pipeline (every PR)

Thorough. Runs against the preview deployment. Catches integration issues.

Install → Typecheck → Unit/Integration → Wait for Preview → E2E → Report

Vercel builds preview URL
from the PR branch

The key insight: Vercel gives you a preview URL on every push for free. The E2E tests just need to point at that URL instead of localhost.

Two Loops

Merge decisions and repo health are different questions. Answering both in one workflow makes PRs slow and health checks shallow.

LoopQuestionTriggerSpeed
Merge loopIs this change safe to merge?PR pushFast (minutes)
Health loopIs the monorepo internally consistent?Post-merge, scheduled, manualThorough (can be slow)

Merge loop runs on every PR. It gates the merge button. Everything in it must finish fast enough that developers don't context-switch while waiting. Branch protection should only require checks that run here.

Health loop runs on a schedule or after merge. It catches cross-project drift, full-monorepo type errors, and dependency staleness. These checks matter — but they don't belong in the PR path because they're slow and their failures rarely trace to the current change.

If a workflow tries to answer both questions, it will be too slow for PRs and too narrow for health.

Cron schedules in CI run UTC. Document the conversion for your timezone. A "nightly" job at 0 0 * * * runs at noon in NZST.

Signal Hierarchy

Not all test signals are equal. The hierarchy from highest to lowest value:

RankSignalWhat it provesLoop
1E2E against deployed previewThe app works for real usersMerge
2Integration tests against real DBData contracts holdMerge
3Structural/architecture checksRepo rules aren't violatedMerge
4Unit tests (affected)Logic transforms are correctMerge
5Full monorepo typecheckCross-project type safetyHealth

Test what the user experiences first. Test what the compiler sees last.

A full monorepo typecheck is valuable — but it catches type errors across projects that weren't touched by the PR. That's a health concern, not a merge concern. Running it on every PR burns minutes and blocks merges on failures the author can't fix.

Preview Deploy Testing

The single highest-leverage infrastructure change. Decouples test environment from development environment.

How It Works

git push → Vercel builds preview → CI runs Playwright against preview URL
│ │
Free on all plans Tests hit real infrastructure
Unique URL per commit No local server needed

Playwright Configuration

One config change makes Playwright work against any URL:

// playwright.config.ts
export default defineConfig({
use: {
baseURL: process.env.BASE_URL || "http://localhost:3000",
},
// Only start local server when NOT in CI
webServer: process.env.CI
? undefined
: {
command: "pnpm dev",
url: "http://localhost:3000",
reuseExistingServer: true,
},
});

Locally: Playwright starts the dev server. In CI: BASE_URL points to the Vercel preview. Same tests, different target.

Deployment Protection

If Vercel has deployment protection enabled, Playwright requests hit an auth page instead of the app. Solutions:

ApproachCostTrade-off
Disable protection for previewsFreePreviews are publicly accessible
Automation bypass headerPro plan ($20/mo)Previews stay protected, CI gets a bypass token
Password protection with env varPro planPlaywright sends the password in a setup step

For most teams, disabling protection on preview deployments is fine — they're ephemeral and the URLs are unguessable.

Test Database Infrastructure

Integration tests (L2) need a real database. Not a mock. Mocking the database hides the bugs that matter — constraint violations, transaction behavior, query performance.

Docker Test Database

Dedicated container, dedicated port, isolated from development:

ConcernApproach
IsolationSeparate port (e.g., 5433) from dev database
Startupdocker compose up with health check, wait for ready signal
CleanupEach test cleans its own data. No shared state between tests.
ResetDrop and recreate between test suites if needed
CIService container in GitHub Actions / self-hosted runner

Port Allocation

Dedicated ports prevent collisions between dev and test:

ServiceDev PortTest Port
Database54325433
App server30004300
API3001— (use preview URL)

CI Design Principles

Four rules govern every CI decision. Break one and costs compound silently.

Unique Signal

Every CI step must catch something that nothing else catches. If two steps catch the same error, one is redundant. Remove it.

StepUnique signalWhy only here
tsconfig validationrootDir (TS6059), missing spec excludesCan't build locally. Vercel fails late.
Architecture auditHexagonal layer violationsStatic analysis, not part of TS or build
Typecheck (affected)Cross-project TS errorsVercel skips via ignoreBuildErrors: true
Test (affected, unit)Logic regressions, value transformsNot caught by typecheck or build

Before adding any step, answer: "What does this catch that nothing else catches?" If you can't name it, don't add it.

One Workflow, One Install

Every pnpm install in CI costs 2-3 minutes. Multiple workflows that each install dependencies multiply this waste. Combine steps into one workflow with sequential jobs. The structural check (no install needed) runs first and fast-fails before the expensive install job.

pr-quality-gate.yml
Job 1: structural (~30s, no install)
- validate-tsconfig-lib.sh
Job 2: quality-gate (~10 min, one install)
- pnpm install
- architecture:audit
- nx affected --target=typecheck
- nx affected --target=test

Skip What Doesn't Matter

Not every push needs CI. Three filters that eliminate waste:

FilterWhat it skipsMechanism
paths-ignoreDocs-only changes (.md, .claude/, docs/)Workflow trigger config
Draft PR checkWork-in-progress PRsif: github.event.pull_request.draft == false
cancel-in-progressSuperseded runs from rapid pushesConcurrency group with cancellation

Separate Loops

Each workflow answers one question. The merge loop answers "is this PR safe?" The health loop answers "is the monorepo consistent?" Combining them into one workflow makes PRs slow and health checks incomplete. See Two Loops for the full principle.

Cost Controls

CI costs are invisible until they aren't. The free tier is generous — until you exceed it and every push costs real money. Treat CI minutes like a budget, not an unlimited resource.

Budget Math

The formula:

monthly_minutes = minutes_per_run × pushes_per_day × working_days

Worked example (GitHub Actions free tier = 2,000 min/month):

VariableValueSource
Minutes per run10 minMeasure from CI logs
Pushes per day5Average over 2 weeks
Working days22Exclude weekends
Monthly total1,100 min55% of free tier

Run this calculation after every CI change. A step that adds 3 minutes per run adds 3 × 5 × 22 = 330 min/month — that's 16% of the free tier from one step.

Alert Thresholds

Set three thresholds. Check monthly on the GitHub Actions billing page.

UsageAction
> 75% of budgetReview push frequency — are draft PRs or docs-only commits triggering CI?
> 85% of budgetIncrease paths-ignore scope or reduce parallelism
> 95% of budgetEscalate — defer new CI steps until headroom returns

Reduce What Runs

The testing strategy page covers the layer model. The infrastructure implication: every E2E test you convert to an integration test saves 30-120 seconds of CI browser time. At scale, this is the biggest cost lever.

ActionSavings
Convert server-action E2E → integration test~60s per test
Run only affected tests (nx affected)Skip unchanged projects entirely
Bail on first failure in PRsStop wasting time after a break
Parallelize with shardingSame wall-clock time, more runner-minutes, but faster feedback

Cheaper Runners

GitHub Actions minutes are premium-priced. Alternatives for browser-heavy workloads:

ProviderModelRelative Cost
GitHub ActionsPer-minute, managedBaseline (1x)
Self-hosted runnerYour machine, free computeFree (your electricity + RAM)
UbicloudDrop-in replacement, bare metal~0.3x
RunsOnYour AWS account, spot instances~0.1-0.15x
BlacksmithManaged, faster builds~0.5x

Cheapest viable path: Self-hosted runner on a spare machine for private repos. Ubicloud for open source.

Simplest path: Run E2E locally against preview URLs. Zero CI minutes for browser tests. The preview URL is the infrastructure — BASE_URL=https://your-preview.vercel.app npx playwright test.

NX and Caching

Monorepo build tools provide two cost-saving mechanisms:

MechanismWhat It Does
Affected commandsOnly test projects changed by the PR. Skip everything else.
Computation cachingIf inputs haven't changed, reuse previous test results.

In CI: nx affected -t test instead of nx run-many -t test. On a 10-project monorepo, this typically skips 60-80% of test runs.

Cross-run caching stores .nx/cache as a GitHub Actions artifact, keyed on nx.json hash + commit SHA. Restores from prefix match on cache miss — partial hits still save time.

Phase-Based Rollout

Add CI steps in phases. Start with what proves the most — real infrastructure, real users. Each phase must prove stability before the next ships.

PhaseWhatLoopPrerequisite
1. IntegrationReal database service containerMergeWorkflow exists, budget under 60%
2. E2EPlaywright against preview URLsMergePhase 1 green rate > 95% over 30 days, budget under 70%
3. StructuralArchitecture audit + affected typecheckMergePhase 2 stable, budget under 80%
4. Full typecheckCross-project type safetyHealthPhase 3 stable, scheduled workflow configured

Integration and E2E come first because they catch the bugs that reach users. Structural checks and full typecheck are guardrails — valuable, but secondary to proving the app works.

Never add the next phase while the current one is flaky. Flaky CI erodes trust faster than no CI at all.

Preflight Checklist

Before any test work:

  • Test database running and healthy
  • Correct environment identified (local vs CI vs preview)
  • Known issues checked (issue log, failing tests)
  • Test type identified → correct layer selected

Never Do

ActionWhyAlternative
Full-project typecheck on large monoreposOOM crashAffected-only typecheck
npm run dev in CIBlocks the runnerBuild + serve, or use preview URL
Skip database readiness checkTests fail silently with connection errorsAlways wait for ready signal
Claim tests pass without running themBreaks trustEvidence or nothing
Arbitrary sleep() in E2E testsFlaky, slowwaitForSelector(), waitForResponse()

Feedback Loop

Tests are instruments. They measure the gap between intent and reality.

TEST FAILURE

├─► Implementation gap → fix the code (engineering concern)

└─► Vision gap → update the spec (product concern)

Every failure is classified: is the code wrong, or is the spec wrong? This distinction routes feedback to the right team and prevents the cycle of fixing code to match a broken spec.

Log failures with evidence: test file, line number, expected vs actual. Structured feedback compounds — pattern recognition across failures reveals systemic issues that individual fixes miss.

Beyond Code: Intent Verification

The same CI principles apply when AI agents execute business operations. Tests verify code correctness. Intent verification verifies agent correctness — does the agent's action match the human's intent?

Code CIAgent CI
Types match specAction matches intent record
Integration tests passPolicy constraints satisfied
E2E proves user journeyFull chain: auth → intent → action → settlement → audit
Preview deploy = real infraVerifiable Intent = real cryptographic proof

As work charts show AI handling 65-85% of business operations, intent verification becomes the quality gate that prevents misaligned automated actions at scale. See Trust Architecture for the full pattern.

Context

Questions

What would break if you deleted your entire CI pipeline and relied only on local testing?

  • Which CI step has never caught a real bug — and what does running it on every push cost per month?
  • If preview deploys are free and eliminate localhost problems, what's stopping full adoption?
  • What's your current monthly CI spend as a percentage of the free tier — and do you know what pushes it over?
  • Which tests in your suite are E2E that could be integration tests — and what would that save in CI minutes?