Skip to main content

Agent Browser

What if your agent could verify its own work without burning your context window?

Vercel's Agent Browser is a Rust-based CLI that gives AI agents browser automation through a Snapshot+Refs model — returning lightweight element references instead of full DOM trees or screenshots. The result: 93% token reduction per page read compared to accessibility-tree approaches.

How It Works

Traditional browser tools send the agent a screenshot (10-20K tokens) or full accessibility tree (15K+ tokens). Agent Browser returns a snapshot with numbered references:

@e1  [button] "Submit Order"
@e2 [input] "Search products..."
@e3 [link] "View Cart (3 items)"
@e4 [select] "Sort by: Price"

The agent reasons over compact refs, not pixel coordinates or DOM paths. Actions target refs directly: click @e1, type @e2 "wireless headphones".

Strengths

AreaDetail
Token efficiency~1K tokens per page read vs 15K+ for full a11y tree. 93% savings.
SpeedRust CLI boots in under 50ms. Daemon mode makes subsequent commands near-instant.
Agent agnosticWorks with Claude Code, Gemini, Codex, Cursor, any tool that can call CLI commands.
Local-firstRuns against localhost. Build a component, verify it renders, all in one agent loop.
Deterministic refsElement references are stable across reads — no flaky CSS selectors or XPath.

Limitations

AreaDetail
AuthHeadless browser — doesn't reuse your logged-in Chrome session. Custom auth flows needed for authenticated pages.
No standard bodyOpen source under Vercel Labs, but not W3C or equivalent. Single-vendor origin.
Write actionsLess granular permission model than Claude in Chrome's human-in-the-loop confirmations.
SPA depthComplex SPAs with heavy client-side routing may need explicit wait/navigation steps.

Setup

npm install -g @anthropic-ai/agent-browser

Works as a CLI tool or MCP server. Any agent that can execute bash commands can use it.

Best For

  • Long autonomous agent sessions where token budget matters
  • CI/CD browser verification — agent builds, then tests its own output
  • Multi-agent workflows where browser tool must be LLM-agnostic
  • Rapid iteration loops: code, snapshot, verify, fix

Checklist Score

Against the decision checklist:

GateScoreNotes
1. PerceptionStrongSnapshot+Refs is the most token-efficient model available
2. ActionGoodRef-based actions stable, but less semantic than named tool calls
3. AuthWeakHeadless — no session reuse, custom auth needed
4. WorkflowStrongCLI-native, any agent, local-first, CI-friendly
5. ArchitectureStrongExternal tool — zero app changes
6. StandardsModerateOpen source, active development, but single vendor
7. Speed/CostStrongBest in class — Rust, 50ms boot, ~1K tokens/page
8. SetupStrongSingle npm install, 5 minutes to first snapshot
9. Blast RadiusGoodRead-only default, CLI controls scope

Context