AI Search
How do agents retrieve live knowledge from the web?
Search is the retrieval substrate for AI agents — what Stripe is to payments. The right search layer means agents ground decisions in current reality rather than stale training data.
Position
| Dimension | Answer |
|---|---|
| Job | Retrieve structured, current web knowledge for agent pipelines |
| Our Choice | Exa (Assess) |
| Candidates | Exa, Tavily, Perplexity API, SerpAPI, Brave Search API |
| Decision criteria | Semantic accuracy, structured output, token efficiency, agent-native API |
Exa
Neural search engine built for machine consumption. End-to-end neural network that understands meaning rather than matching keywords.
Capabilities
| Capability | API | Latency | Best for |
|---|---|---|---|
| Neural + keyword search | /search type: auto | ~1s | Context enrichment, research agents |
| Instant retrieval | /search type: instant | ~200ms | Real-time in-app lookup |
| Deep research | /search type: deep + output_schema | 5-60s | Structured intelligence reports |
| Find similar | /findSimilar | ~1s | Competitor discovery, pattern matching |
| JSON extraction | type: deep with JSON Schema | 5-60s | Typed data pipelines |
| Research pipeline | /research (async) | minutes | Automated competitive analysis |
| Highlights | /contents with highlights | fast | Token-efficient context (4k chars) |
| Category search | category: company/people/research | ~1s | Lead enrichment, talent discovery |
Key Differentiator
The highlights feature extracts only relevant tokens from a webpage — 10x more token-efficient than full-text retrieval. For agent pipelines processing hundreds of pages, this is the difference between viable and expensive.
Hex Architecture Fit
Exa slots in as an output-side secondary adapter. Domain logic never touches Exa directly — it requests enrichment via a port.
Domain ──→ ContextEnrichmentPort ──→ ExaContextAdapter ──→ Exa API
Swappable. Testable. Domain stays pure.
// domain/ports/output/ContextEnrichmentPort.ts
interface ContextEnrichmentPort {
enrichNode(nodeId: string, domainLabel: string): Promise<NodeContext>
findSimilarPatterns(signature: PatternSignature): Promise<Precedent[]>
getBenchmarks(metricKey: string, category: string): Promise<BenchmarkRange>
}
Integration Opportunities
In the Work Charts App
| Integration | What it does | Complexity |
|---|---|---|
| Context Curtain | Background instant search when chart shows anomaly — surfaces "here's why" card | Low |
| Benchmark Ghost Lines | Industry benchmark band overlaid on team data from structured research | Medium |
| Pattern-to-Precedent | Anomaly detected triggers search for historical precedents | Medium |
| Living Legend | Chart legend nodes enriched with cited definitions and standards | Medium |
| Collaborative Intelligence | Find public retrospectives from teams at similar scale | Medium |
In the Dev Process
| Integration | What it does | Complexity |
|---|---|---|
| Competitor Radar | Weekly findSimilar() pipeline — living competitive intelligence DB | Medium |
| Signal Filter | Weekly structured briefing: 3 signals for DePIN, agents, Stackmates | Low |
| Living PRD | Research job before feature spec — ground evidence in real user language | Low |
| PR Context Bot | GitHub Action enriches PRs with 30-day web context for the touched domain | Low |
| Talent Lens | category: people search — semantic people discovery, not scraping | Low |
| Provenance Trail | Every enriched insight has source URL + timestamp — trust primitive | Medium |
Prioritisation (Sutherland Score)
Highest psychological value per engineering effort:
- Context Curtain — improves perception of understanding, not the chart itself
- Benchmark Ghost Lines — users want to know if their data is good, not just see it
- Living PRD — defensible feature decisions grounded in real language
- Competitor Radar — compounding weekly intelligence for cost of one workflow
Reference Implementation
- exa-labs/company-researcher — Next.js + Anthropic + Exa company analysis tool. Same stack as Stackmates. Fork-ready.
- WebCode Benchmark — Exa's open benchmark for web search quality in coding agents. 82.8 completeness vs 59-74 for competitors.
Context
- AI Data Pipelines — retrieval layer feeds the pipeline
- Context Graphs — search enriches the graph
- Tech Decisions — evaluation process
- Hexagonal Architecture — port/adapter pattern for integration
Links
- Exa API Docs — full API reference
- Exa Company Researcher — reference implementation
- Exa WebCode Benchmark — search quality evaluation for coding agents
Questions
How does search quality compound when agents make decisions based on retrieved context?
- What is the cost of a wrong retrieval vs no retrieval — and how do you measure groundedness?
- At what point does cached search replace live search without losing trust?
- How does the highlights compression ratio change the economics of agent pipelines?