Skip to main content

Playwright CLI

What makes browser tests trustworthy? Repeatability.

Playwright gives agents a deterministic, scriptable browser automation layer. Specs are durable executable knowledge — rerunnable, diffable, CI-friendly. Where agent browsers explore, Playwright proves.

Why Playwright

PropertyValue
Execution modelSpec-first — same flow, same result, every time
ArtifactsTraces, screenshots, video, HTML reports
CI integrationHeadless by default, parallel sharding, Nx plugin support
Debuggingcodegen for selector generation, --debug for step-through, --ui for interactive mode
Agent orchestrationClaude drives Playwright via CLI — agentic planning, deterministic execution

Commissioning Role

Playwright is the e2e execution substrate for Automated Commissioning. Two runners feed one L-level computation:

LayerRunnerVerifies
LogicVitestUnit tests, data contracts
BrowserPlaywrightUI features, user flows, screen contracts

Feature with passing unit tests but failing Playwright spec is capped at L2. The L-level computer treats both runners as evidence sources.

Nx Integration

The engineering repo uses Nx's Playwright plugin. playwright.config.ts triggers inferred e2e targets. The commission script invokes existing Nx targets — no parallel browser workflow.

# Nx-native e2e execution
pnpm nx e2e web-e2e --project=chromium
pnpm nx e2e web-e2e --ui

# Direct Playwright CLI
pnpm exec playwright codegen http://localhost:3000
pnpm exec playwright test apps/web-e2e/src/auth.spec.ts --debug

# Commissioning (scoped to feature)
pnpm nx e2e web-e2e --grep="AUTH-001"

Versus Agent Browsers

ModeBest forWeakness
Agent browserExploration, ad hoc interaction, "go try this"Less deterministic, weaker as long-lived executable knowledge
Playwright CLIRepeatable flows, debugging, CI, durable browser proceduresMore setup, selector maintenance

Commissioning demands repeatability. Agent browsers are the exploration tool. Playwright is the verification tool.

Context

Questions

When does a Playwright spec become more expensive to maintain than the manual check it replaced?

  • At what feature count does CI time force test sharding across parallel workers?
  • Should selectors come from Screen Contracts (dream repo) or be discovered via codegen (engineering repo)?
  • What's the right rotation policy for trace archives before storage costs compound?