Playwright CLI
What makes browser tests trustworthy? Repeatability.
Playwright gives agents a deterministic, scriptable browser automation layer. Specs are durable executable knowledge — rerunnable, diffable, CI-friendly. Where agent browsers explore, Playwright proves.
Why Playwright
| Property | Value |
|---|---|
| Execution model | Spec-first — same flow, same result, every time |
| Artifacts | Traces, screenshots, video, HTML reports |
| CI integration | Headless by default, parallel sharding, Nx plugin support |
| Debugging | codegen for selector generation, --debug for step-through, --ui for interactive mode |
| Agent orchestration | Claude drives Playwright via CLI — agentic planning, deterministic execution |
Commissioning Role
Playwright is the e2e execution substrate for Automated Commissioning. Two runners feed one L-level computation:
| Layer | Runner | Verifies |
|---|---|---|
| Logic | Vitest | Unit tests, data contracts |
| Browser | Playwright | UI features, user flows, screen contracts |
Feature with passing unit tests but failing Playwright spec is capped at L2. The L-level computer treats both runners as evidence sources.
Nx Integration
The engineering repo uses Nx's Playwright plugin. playwright.config.ts triggers inferred e2e targets. The commission script invokes existing Nx targets — no parallel browser workflow.
# Nx-native e2e execution
pnpm nx e2e web-e2e --project=chromium
pnpm nx e2e web-e2e --ui
# Direct Playwright CLI
pnpm exec playwright codegen http://localhost:3000
pnpm exec playwright test apps/web-e2e/src/auth.spec.ts --debug
# Commissioning (scoped to feature)
pnpm nx e2e web-e2e --grep="AUTH-001"
Versus Agent Browsers
| Mode | Best for | Weakness |
|---|---|---|
| Agent browser | Exploration, ad hoc interaction, "go try this" | Less deterministic, weaker as long-lived executable knowledge |
| Playwright CLI | Repeatable flows, debugging, CI, durable browser procedures | More setup, selector maintenance |
Commissioning demands repeatability. Agent browsers are the exploration tool. Playwright is the verification tool.
Context
- AI Browser Tools — Decision checklist and adoption radar
- Automated Commissioning — The PRD this tool serves
- Commissioning Protocol — L0-L4 definitions
Links
- Playwright Test CLI — Official CLI reference
Questions
When does a Playwright spec become more expensive to maintain than the manual check it replaced?
- At what feature count does CI time force test sharding across parallel workers?
- Should selectors come from Screen Contracts (dream repo) or be discovered via
codegen(engineering repo)? - What's the right rotation policy for trace archives before storage costs compound?