Agent Browser
What if your agent could verify its own work without burning your context window?
Vercel's Agent Browser is a Rust-based CLI that gives AI agents browser automation through a Snapshot+Refs model — returning lightweight element references instead of full DOM trees or screenshots. The result: 93% token reduction per page read compared to accessibility-tree approaches.
How It Works
Traditional browser tools send the agent a screenshot (10-20K tokens) or full accessibility tree (15K+ tokens). Agent Browser returns a snapshot with numbered references:
@e1 [button] "Submit Order"
@e2 [input] "Search products..."
@e3 [link] "View Cart (3 items)"
@e4 [select] "Sort by: Price"
The agent reasons over compact refs, not pixel coordinates or DOM paths. Actions target refs directly: click @e1, type @e2 "wireless headphones".
Strengths
| Area | Detail |
|---|---|
| Token efficiency | ~1K tokens per page read vs 15K+ for full a11y tree. 93% savings. |
| Speed | Rust CLI boots in under 50ms. Daemon mode makes subsequent commands near-instant. |
| Agent agnostic | Works with Claude Code, Gemini, Codex, Cursor, any tool that can call CLI commands. |
| Local-first | Runs against localhost. Build a component, verify it renders, all in one agent loop. |
| Deterministic refs | Element references are stable across reads — no flaky CSS selectors or XPath. |
Limitations
| Area | Detail |
|---|---|
| Auth | Headless browser — doesn't reuse your logged-in Chrome session. Custom auth flows needed for authenticated pages. |
| No standard body | Open source under Vercel Labs, but not W3C or equivalent. Single-vendor origin. |
| Write actions | Less granular permission model than Claude in Chrome's human-in-the-loop confirmations. |
| SPA depth | Complex SPAs with heavy client-side routing may need explicit wait/navigation steps. |
Setup
npm install -g @anthropic-ai/agent-browser
Works as a CLI tool or MCP server. Any agent that can execute bash commands can use it.
Best For
- Long autonomous agent sessions where token budget matters
- CI/CD browser verification — agent builds, then tests its own output
- Multi-agent workflows where browser tool must be LLM-agnostic
- Rapid iteration loops: code, snapshot, verify, fix
Checklist Score
Against the decision checklist:
| Gate | Score | Notes |
|---|---|---|
| 1. Perception | Strong | Snapshot+Refs is the most token-efficient model available |
| 2. Action | Good | Ref-based actions stable, but less semantic than named tool calls |
| 3. Auth | Weak | Headless — no session reuse, custom auth needed |
| 4. Workflow | Strong | CLI-native, any agent, local-first, CI-friendly |
| 5. Architecture | Strong | External tool — zero app changes |
| 6. Standards | Moderate | Open source, active development, but single vendor |
| 7. Speed/Cost | Strong | Best in class — Rust, 50ms boot, ~1K tokens/page |
| 8. Setup | Strong | Single npm install, 5 minutes to first snapshot |
| 9. Blast Radius | Good | Read-only default, CLI controls scope |
Context
- AI Browser Tools — Decision checklist and radar
- AI Coding Config — Multi-agent setup
- Tech Decisions — General evaluation framework