Playwright CLI

What makes browser tests trustworthy? Repeatability.

Playwright gives agents a deterministic, scriptable browser automation layer. Specs are durable executable knowledge — rerunnable, diffable, CI-friendly. Where agent browsers explore, Playwright proves.

Why Playwright

Property	Value
Execution model	Spec-first — same flow, same result, every time
Artifacts	Traces, screenshots, video, HTML reports
CI integration	Headless by default, parallel sharding, Nx plugin support
Debugging	`codegen` for selector generation, `--debug` for step-through, `--ui` for interactive mode
Agent orchestration	Claude drives Playwright via CLI — agentic planning, deterministic execution

Commissioning Role

Playwright is the e2e execution substrate for Automated Commissioning. Two runners feed one L-level computation:

Layer	Runner	Verifies
Logic	Vitest	Unit tests, data contracts
Browser	Playwright	UI features, user flows, screen contracts

Feature with passing unit tests but failing Playwright spec is capped at L2. The L-level computer treats both runners as evidence sources.

Nx Integration

The engineering repo uses Nx's Playwright plugin. playwright.config.ts triggers inferred e2e targets. The commission script invokes existing Nx targets — no parallel browser workflow.

# Nx-native e2e execution
pnpm nx e2e web-e2e --project=chromium
pnpm nx e2e web-e2e --ui

# Direct Playwright CLI
pnpm exec playwright codegen http://localhost:3000
pnpm exec playwright test apps/web-e2e/src/auth.spec.ts --debug

# Commissioning (scoped to feature)
pnpm nx e2e web-e2e --grep="AUTH-001"

Versus Agent Browsers

Mode	Best for	Weakness
Agent browser	Exploration, ad hoc interaction, "go try this"	Less deterministic, weaker as long-lived executable knowledge
Playwright CLI	Repeatable flows, debugging, CI, durable browser procedures	More setup, selector maintenance

Commissioning demands repeatability. Agent browsers are the exploration tool. Playwright is the verification tool.

Context

AI Browser Tools — Decision checklist and adoption radar
Automated Commissioning — The PRD this tool serves
Commissioning Protocol — L0-L4 definitions

Questions

When does a Playwright spec become more expensive to maintain than the manual check it replaced?

At what feature count does CI time force test sharding across parallel workers?
Should selectors come from Screen Contracts (dream repo) or be discovered via codegen (engineering repo)?
What's the right rotation policy for trace archives before storage costs compound?

Why Playwright​

Commissioning Role​

Nx Integration​

Versus Agent Browsers​

Context​

Links​

Questions​