Agent Browser

What if your agent could verify its own work without burning your context window?

Vercel's Agent Browser is a Rust-based CLI that gives AI agents browser automation through a Snapshot+Refs model — returning lightweight element references instead of full DOM trees or screenshots. The result: 93% token reduction per page read compared to accessibility-tree approaches.

How It Works

Traditional browser tools send the agent a screenshot (10-20K tokens) or full accessibility tree (15K+ tokens). Agent Browser returns a snapshot with numbered references:

@e1  [button] "Submit Order"
@e2  [input]  "Search products..."
@e3  [link]   "View Cart (3 items)"
@e4  [select] "Sort by: Price"

The agent reasons over compact refs, not pixel coordinates or DOM paths. Actions target refs directly: click @e1, type @e2 "wireless headphones".

Strengths

Area	Detail
Token efficiency	~1K tokens per page read vs 15K+ for full a11y tree. 93% savings.
Speed	Rust CLI boots in under 50ms. Daemon mode makes subsequent commands near-instant.
Agent agnostic	Works with Claude Code, Gemini, Codex, Cursor, any tool that can call CLI commands.
Local-first	Runs against localhost. Build a component, verify it renders, all in one agent loop.
Deterministic refs	Element references are stable across reads — no flaky CSS selectors or XPath.

Limitations

Area	Detail
Auth	Headless browser — doesn't reuse your logged-in Chrome session. Custom auth flows needed for authenticated pages.
No standard body	Open source under Vercel Labs, but not W3C or equivalent. Single-vendor origin.
Write actions	Less granular permission model than Claude in Chrome's human-in-the-loop confirmations.
SPA depth	Complex SPAs with heavy client-side routing may need explicit wait/navigation steps.

Setup

npm install -g @anthropic-ai/agent-browser

Works as a CLI tool or MCP server. Any agent that can execute bash commands can use it.

Best For

Long autonomous agent sessions where token budget matters
CI/CD browser verification — agent builds, then tests its own output
Multi-agent workflows where browser tool must be LLM-agnostic
Rapid iteration loops: code, snapshot, verify, fix

Checklist Score

Against the decision checklist:

Gate	Score	Notes
1. Perception	Strong	Snapshot+Refs is the most token-efficient model available
2. Action	Good	Ref-based actions stable, but less semantic than named tool calls
3. Auth	Weak	Headless — no session reuse, custom auth needed
4. Workflow	Strong	CLI-native, any agent, local-first, CI-friendly
5. Architecture	Strong	External tool — zero app changes
6. Standards	Moderate	Open source, active development, but single vendor
7. Speed/Cost	Strong	Best in class — Rust, 50ms boot, ~1K tokens/page
8. Setup	Strong	Single npm install, 5 minutes to first snapshot
9. Blast Radius	Good	Read-only default, CLI controls scope

Context

AI Browser Tools — Decision checklist and radar
AI Coding Config — Multi-agent setup
Tech Decisions — General evaluation framework

How It Works​

Strengths​

Limitations​

Setup​

Best For​

Checklist Score​

Context​