Testing Infrastructure

Where do your tests run — and what does that cost?

MERGE LOOP (every PR — fast, blocks merge)
  COMMIT → INTEGRATION → PREVIEW DEPLOY → E2E → MERGE
    │           │               │            │
    ▼           ▼               ▼            ▼
  Local     Real DB          Real URL     Real browser
  Free      5-30s            Free tier    30-120s

HEALTH LOOP (scheduled/post-merge — thorough, never blocks PRs)
  FULL TYPECHECK → ARCHITECTURE AUDIT → CROSS-PROJECT TESTS → REPORT
       │                   │                    │                  │
       ▼                   ▼                    ▼                  ▼
    All projects     Layer violations      Dependency drift     Dashboard
    Minutes          Static analysis       Integration          Alerting

Testing infrastructure is the machinery that runs tests — CI pipelines, databases, runners, preview environments. Get this wrong and you pay in flaky tests, blocked terminals, or expensive CI bills. Get it right and every push gets verified automatically against a real deployment.

The Core Principle

Test against deployed artifacts, not dev servers.

Running E2E tests against localhost means port conflicts, RAM contention, and the false choice between "dev server running" or "run tests." Preview deploys eliminate this entire class of problems. A real URL. Real infrastructure. No local resource cost.

Pipeline Design

Two pipelines. Different triggers, different budgets.

Local Pipeline (every commit)

Fast, free, catches 90% of issues.

tsc --noEmit → vitest run → done
     │              │
   Types OK?    Unit + Integration pass?
   <10s         <60s

Budget: Under 90 seconds total. If it exceeds this, tests will be skipped. Set a hard ceiling and enforce it.

Requirements:

Test database running (Docker container on a dedicated port)
No dev server needed — unit and integration tests don't use the browser
Memory-safe typecheck — use affected-file checking, not full project

Command	Purpose	Trap to Avoid
`tsc --noEmit` or `pnpm tc`	Type verification	Full project typecheck can OOM on large monorepos. Use affected-only.
`vitest run` or `jest --bail`	Unit + integration	`--bail` stops on first failure. Fast feedback.

CI Pipeline (every PR)

Thorough. Runs against the preview deployment. Catches integration issues.

Install → Typecheck → Unit/Integration → Wait for Preview → E2E → Report
                                              │
                                    Vercel builds preview URL
                                    from the PR branch

The key insight: Vercel gives you a preview URL on every push for free. The E2E tests just need to point at that URL instead of localhost.

Two Loops

Merge decisions and repo health are different questions. Answering both in one workflow makes PRs slow and health checks shallow.

Loop	Question	Trigger	Speed
Merge loop	Is this change safe to merge?	PR push	Fast (minutes)
Health loop	Is the monorepo internally consistent?	Post-merge, scheduled, manual	Thorough (can be slow)

Merge loop runs on every PR. It gates the merge button. Everything in it must finish fast enough that developers don't context-switch while waiting. Branch protection should only require checks that run here.

Health loop runs on a schedule or after merge. It catches cross-project drift, full-monorepo type errors, and dependency staleness. These checks matter — but they don't belong in the PR path because they're slow and their failures rarely trace to the current change.

If a workflow tries to answer both questions, it will be too slow for PRs and too narrow for health.

Cron schedules in CI run UTC. Document the conversion for your timezone. A "nightly" job at 0 0 * * * runs at noon in NZST.

Signal Hierarchy

Not all test signals are equal. The hierarchy from highest to lowest value:

Rank	Signal	What it proves	Loop
1	E2E against deployed preview	The app works for real users	Merge
2	Integration tests against real DB	Data contracts hold	Merge
3	Structural/architecture checks	Repo rules aren't violated	Merge
4	Unit tests (affected)	Logic transforms are correct	Merge
5	Full monorepo typecheck	Cross-project type safety	Health

Test what the user experiences first. Test what the compiler sees last.

A full monorepo typecheck is valuable — but it catches type errors across projects that weren't touched by the PR. That's a health concern, not a merge concern. Running it on every PR burns minutes and blocks merges on failures the author can't fix.

Preview Deploy Testing

The single highest-leverage infrastructure change. Decouples test environment from development environment.

How It Works

git push → Vercel builds preview → CI runs Playwright against preview URL
                │                          │
          Free on all plans         Tests hit real infrastructure
          Unique URL per commit     No local server needed

Playwright Configuration

One config change makes Playwright work against any URL:

// playwright.config.ts
export default defineConfig({
  use: {
    baseURL: process.env.BASE_URL || "http://localhost:3000",
  },
  // Only start local server when NOT in CI
  webServer: process.env.CI
    ? undefined
    : {
        command: "pnpm dev",
        url: "http://localhost:3000",
        reuseExistingServer: true,
      },
});

Locally: Playwright starts the dev server. In CI: BASE_URL points to the Vercel preview. Same tests, different target.

Deployment Protection

If Vercel has deployment protection enabled, Playwright requests hit an auth page instead of the app. Solutions:

Approach	Cost	Trade-off
Disable protection for previews	Free	Previews are publicly accessible
Automation bypass header	Pro plan ($20/mo)	Previews stay protected, CI gets a bypass token
Password protection with env var	Pro plan	Playwright sends the password in a setup step

For most teams, disabling protection on preview deployments is fine — they're ephemeral and the URLs are unguessable.

Test Database Infrastructure

Integration tests (L2) need a real database. Not a mock. Mocking the database hides the bugs that matter — constraint violations, transaction behavior, query performance.

Docker Test Database

Dedicated container, dedicated port, isolated from development:

Concern	Approach
Isolation	Separate port (e.g., 5433) from dev database
Startup	`docker compose up` with health check, wait for ready signal
Cleanup	Each test cleans its own data. No shared state between tests.
Reset	Drop and recreate between test suites if needed
CI	Service container in GitHub Actions / self-hosted runner

Port Allocation

Dedicated ports prevent collisions between dev and test:

Service	Dev Port	Test Port
Database	5432	5433
App server	3000	4300
API	3001	— (use preview URL)

CI Design Principles

Four rules govern every CI decision. Break one and costs compound silently.

Unique Signal

Every CI step must catch something that nothing else catches. If two steps catch the same error, one is redundant. Remove it.

Step	Unique signal	Why only here
tsconfig validation	`rootDir` (TS6059), missing spec excludes	Can't build locally. Vercel fails late.
Architecture audit	Hexagonal layer violations	Static analysis, not part of TS or build
Typecheck (affected)	Cross-project TS errors	Vercel skips via `ignoreBuildErrors: true`
Test (affected, unit)	Logic regressions, value transforms	Not caught by typecheck or build

Before adding any step, answer: "What does this catch that nothing else catches?" If you can't name it, don't add it.

One Workflow, One Install

Every pnpm install in CI costs 2-3 minutes. Multiple workflows that each install dependencies multiply this waste. Combine steps into one workflow with sequential jobs. The structural check (no install needed) runs first and fast-fails before the expensive install job.

pr-quality-gate.yml
  Job 1: structural        (~30s, no install)
    - validate-tsconfig-lib.sh
  Job 2: quality-gate      (~10 min, one install)
    - pnpm install
    - architecture:audit
    - nx affected --target=typecheck
    - nx affected --target=test

Skip What Doesn't Matter

Not every push needs CI. Three filters that eliminate waste:

Filter	What it skips	Mechanism
`paths-ignore`	Docs-only changes (`.md`, `.claude/`, `docs/`)	Workflow trigger config
Draft PR check	Work-in-progress PRs	`if: github.event.pull_request.draft == false`
`cancel-in-progress`	Superseded runs from rapid pushes	Concurrency group with cancellation

Separate Loops

Each workflow answers one question. The merge loop answers "is this PR safe?" The health loop answers "is the monorepo consistent?" Combining them into one workflow makes PRs slow and health checks incomplete. See Two Loops for the full principle.

Cost Controls

CI costs are invisible until they aren't. The free tier is generous — until you exceed it and every push costs real money. Treat CI minutes like a budget, not an unlimited resource.

Budget Math

The formula:

monthly_minutes = minutes_per_run × pushes_per_day × working_days

Worked example (GitHub Actions free tier = 2,000 min/month):

Variable	Value	Source
Minutes per run	10 min	Measure from CI logs
Pushes per day	5	Average over 2 weeks
Working days	22	Exclude weekends
Monthly total	1,100 min	55% of free tier

Run this calculation after every CI change. A step that adds 3 minutes per run adds 3 × 5 × 22 = 330 min/month — that's 16% of the free tier from one step.

Alert Thresholds

Set three thresholds. Check monthly on the GitHub Actions billing page.

Usage	Action
> 75% of budget	Review push frequency — are draft PRs or docs-only commits triggering CI?
> 85% of budget	Increase `paths-ignore` scope or reduce parallelism
> 95% of budget	Escalate — defer new CI steps until headroom returns

Reduce What Runs

The testing strategy page covers the layer model. The infrastructure implication: every E2E test you convert to an integration test saves 30-120 seconds of CI browser time. At scale, this is the biggest cost lever.

Action	Savings
Convert server-action E2E → integration test	~60s per test
Run only affected tests (`nx affected`)	Skip unchanged projects entirely
Bail on first failure in PRs	Stop wasting time after a break
Parallelize with sharding	Same wall-clock time, more runner-minutes, but faster feedback

Cheaper Runners

GitHub Actions minutes are premium-priced. Alternatives for browser-heavy workloads:

Provider	Model	Relative Cost
GitHub Actions	Per-minute, managed	Baseline (1x)
Self-hosted runner	Your machine, free compute	Free (your electricity + RAM)
Ubicloud	Drop-in replacement, bare metal	~0.3x
RunsOn	Your AWS account, spot instances	~0.1-0.15x
Blacksmith	Managed, faster builds	~0.5x

Cheapest viable path: Self-hosted runner on a spare machine for private repos. Ubicloud for open source.

Simplest path: Run E2E locally against preview URLs. Zero CI minutes for browser tests. The preview URL is the infrastructure — BASE_URL=https://your-preview.vercel.app npx playwright test.

NX and Caching

Monorepo build tools provide two cost-saving mechanisms:

Mechanism	What It Does
Affected commands	Only test projects changed by the PR. Skip everything else.
Computation caching	If inputs haven't changed, reuse previous test results.

In CI: nx affected -t test instead of nx run-many -t test. On a 10-project monorepo, this typically skips 60-80% of test runs.

Cross-run caching stores .nx/cache as a GitHub Actions artifact, keyed on nx.json hash + commit SHA. Restores from prefix match on cache miss — partial hits still save time.

Phase-Based Rollout

Add CI steps in phases. Start with what proves the most — real infrastructure, real users. Each phase must prove stability before the next ships.

Phase	What	Loop	Prerequisite
1. Integration	Real database service container	Merge	Workflow exists, budget under 60%
2. E2E	Playwright against preview URLs	Merge	Phase 1 green rate > 95% over 30 days, budget under 70%
3. Structural	Architecture audit + affected typecheck	Merge	Phase 2 stable, budget under 80%
4. Full typecheck	Cross-project type safety	Health	Phase 3 stable, scheduled workflow configured

Integration and E2E come first because they catch the bugs that reach users. Structural checks and full typecheck are guardrails — valuable, but secondary to proving the app works.

Never add the next phase while the current one is flaky. Flaky CI erodes trust faster than no CI at all.

Preflight Checklist

Before any test work:

Test database running and healthy
Correct environment identified (local vs CI vs preview)
Known issues checked (issue log, failing tests)
Test type identified → correct layer selected

Never Do

Action	Why	Alternative
Full-project typecheck on large monorepos	OOM crash	Affected-only typecheck
`npm run dev` in CI	Blocks the runner	Build + serve, or use preview URL
Skip database readiness check	Tests fail silently with connection errors	Always wait for ready signal
Claim tests pass without running them	Breaks trust	Evidence or nothing
Arbitrary `sleep()` in E2E tests	Flaky, slow	`waitForSelector()`, `waitForResponse()`

Feedback Loop

Tests are instruments. They measure the gap between intent and reality.

TEST FAILURE
     │
     ├─► Implementation gap → fix the code (engineering concern)
     │
     └─► Vision gap → update the spec (product concern)

Every failure is classified: is the code wrong, or is the spec wrong? This distinction routes feedback to the right team and prevents the cycle of fixing code to match a broken spec.

Log failures with evidence: test file, line number, expected vs actual. Structured feedback compounds — pattern recognition across failures reveals systemic issues that individual fixes miss.

Beyond Code: Intent Verification

The same CI principles apply when AI agents execute business operations. Tests verify code correctness. Intent verification verifies agent correctness — does the agent's action match the human's intent?

Code CI	Agent CI
Types match spec	Action matches intent record
Integration tests pass	Policy constraints satisfied
E2E proves user journey	Full chain: auth → intent → action → settlement → audit
Preview deploy = real infra	Verifiable Intent = real cryptographic proof

As work charts show AI handling 65-85% of business operations, intent verification becomes the quality gate that prevents misaligned automated actions at scale. See Trust Architecture for the full pattern.

Context

Testing Strategy — Layer model, selection rules, hexagonal testing
Testing Economics — Cost-benefit model per test layer
Testing Tools — Vitest, Jest, Playwright, RTL
Monorepo Build Tools — NX affected commands, caching
Dev Environment — Docker, containers, isolation
Deployment Checklist — What happens after tests pass
DevOps — CI/CD, security, git practices
CI Strategy Audit — Gap analysis and engineering task spec

Questions

What would break if you deleted your entire CI pipeline and relied only on local testing?

Which CI step has never caught a real bug — and what does running it on every push cost per month?
If preview deploys are free and eliminate localhost problems, what's stopping full adoption?
What's your current monthly CI spend as a percentage of the free tier — and do you know what pushes it over?
Which tests in your suite are E2E that could be integration tests — and what would that save in CI minutes?

The Core Principle​

Pipeline Design​

Local Pipeline (every commit)​

CI Pipeline (every PR)​

Two Loops​

Signal Hierarchy​

Preview Deploy Testing​

How It Works​

Playwright Configuration​

Deployment Protection​

Test Database Infrastructure​

Docker Test Database​

Port Allocation​

CI Design Principles​

Unique Signal​

One Workflow, One Install​

Skip What Doesn't Matter​

Separate Loops​

Cost Controls​

Budget Math​

Alert Thresholds​

Reduce What Runs​

Cheaper Runners​

NX and Caching​

Phase-Based Rollout​

Preflight Checklist​

Never Do​

Feedback Loop​

Beyond Code: Intent Verification​

Context​

Questions​