Skip to main content

AI Search

How do agents retrieve live knowledge from the web?

Search is the retrieval substrate for AI agents — what Stripe is to payments. The right search layer means agents ground decisions in current reality rather than stale training data.

Position

DimensionAnswer
JobRetrieve structured, current web knowledge for agent pipelines
Our ChoiceExa (Assess)
CandidatesExa, Tavily, Perplexity API, SerpAPI, Brave Search API
Decision criteriaSemantic accuracy, structured output, token efficiency, agent-native API

Exa

Neural search engine built for machine consumption. End-to-end neural network that understands meaning rather than matching keywords.

Capabilities

CapabilityAPILatencyBest for
Neural + keyword search/search type: auto~1sContext enrichment, research agents
Instant retrieval/search type: instant~200msReal-time in-app lookup
Deep research/search type: deep + output_schema5-60sStructured intelligence reports
Find similar/findSimilar~1sCompetitor discovery, pattern matching
JSON extractiontype: deep with JSON Schema5-60sTyped data pipelines
Research pipeline/research (async)minutesAutomated competitive analysis
Highlights/contents with highlightsfastToken-efficient context (4k chars)
Category searchcategory: company/people/research~1sLead enrichment, talent discovery

Key Differentiator

The highlights feature extracts only relevant tokens from a webpage — 10x more token-efficient than full-text retrieval. For agent pipelines processing hundreds of pages, this is the difference between viable and expensive.

Hex Architecture Fit

Exa slots in as an output-side secondary adapter. Domain logic never touches Exa directly — it requests enrichment via a port.

Domain ──→ ContextEnrichmentPort ──→ ExaContextAdapter ──→ Exa API

Swappable. Testable. Domain stays pure.

// domain/ports/output/ContextEnrichmentPort.ts
interface ContextEnrichmentPort {
enrichNode(nodeId: string, domainLabel: string): Promise<NodeContext>
findSimilarPatterns(signature: PatternSignature): Promise<Precedent[]>
getBenchmarks(metricKey: string, category: string): Promise<BenchmarkRange>
}

Integration Opportunities

In the Work Charts App

IntegrationWhat it doesComplexity
Context CurtainBackground instant search when chart shows anomaly — surfaces "here's why" cardLow
Benchmark Ghost LinesIndustry benchmark band overlaid on team data from structured researchMedium
Pattern-to-PrecedentAnomaly detected triggers search for historical precedentsMedium
Living LegendChart legend nodes enriched with cited definitions and standardsMedium
Collaborative IntelligenceFind public retrospectives from teams at similar scaleMedium

In the Dev Process

IntegrationWhat it doesComplexity
Competitor RadarWeekly findSimilar() pipeline — living competitive intelligence DBMedium
Signal FilterWeekly structured briefing: 3 signals for DePIN, agents, StackmatesLow
Living PRDResearch job before feature spec — ground evidence in real user languageLow
PR Context BotGitHub Action enriches PRs with 30-day web context for the touched domainLow
Talent Lenscategory: people search — semantic people discovery, not scrapingLow
Provenance TrailEvery enriched insight has source URL + timestamp — trust primitiveMedium

Prioritisation (Sutherland Score)

Highest psychological value per engineering effort:

  1. Context Curtain — improves perception of understanding, not the chart itself
  2. Benchmark Ghost Lines — users want to know if their data is good, not just see it
  3. Living PRD — defensible feature decisions grounded in real language
  4. Competitor Radar — compounding weekly intelligence for cost of one workflow

Reference Implementation

  • exa-labs/company-researcher — Next.js + Anthropic + Exa company analysis tool. Same stack as Stackmates. Fork-ready.
  • WebCode Benchmark — Exa's open benchmark for web search quality in coding agents. 82.8 completeness vs 59-74 for competitors.

Context

Questions

How does search quality compound when agents make decisions based on retrieved context?

  • What is the cost of a wrong retrieval vs no retrieval — and how do you measure groundedness?
  • At what point does cached search replace live search without losing trust?
  • How does the highlights compression ratio change the economics of agent pipelines?