Skip to main content

AI MCP Tools

Which tools should your agents carry — and what does each one cost to hold?

MCP tools expand what agents can do. They also expand what agents consume. Research shows MCP tool definitions inflate input tokens by 3x to 236x depending on the toolset. Loading everything is prohibitively expensive. Loading nothing is crippling. The protocol below finds the balance — maximum effectiveness at optimum efficiency.

Adoption Radar

Inspired by ThoughtWorks Tech Radar. Status reflects current assessment for AI agent workflows.

RingMeaning
AdoptUse in production workflows. Proven, low risk.
TrialUse on real projects with eyes open. Active testing.
AssessExplore. Understand the trade-offs. Don't depend on it yet.
HoldWait. Immature, unstable, or superseded.

Current Radar (March 2026)

In use — tools we currently have configured and are actively using or testing.

ToolCategoryRingTrajectoryToken CostRationale
Perplexity MCPSearch + ResearchAdoptStableMedium (~2-5K/query)Primary research tool. Three modes: search, reason, deep_research. Replaces manual web search
GitHub MCPCode + ReposTrialRisingLow (~500-1K/call)PR management, issue tracking, code search. Overlaps with gh CLI — evaluate which is leaner
Supabase MCPDatabase + DiagramsTrialRisingLow (~300-800/call)Direct DB access + built-in schema visualizer. Both teams have access
Context7DocumentationTrialRisingMedium (~1-3K/query)Library docs lookup. Prevents hallucinated API calls. High value for unfamiliar frameworks
Claude in ChromeBrowserTrialStableHigh (~10-20K/read)Full browser control. Screenshots expensive. See Browser Tools
Pencil.devDesign + CodeTrialRisingMedium (~1-3K/call)AI-native design-to-code. MCP server lets agents draw on canvas. .pen = JSON = git-diffable
Vercel MCPDeploy + HostingAssessStableLow (~500/call)Deploy previews, logs. Useful when debugging deploys
Indeed MCPJobs + HiringAssessUnknownLow (~500/call)Job search, company data. No current workflow demands it
Playwright MCPBrowser TestingHoldMediumAgent Browser + Claude in Chrome cover this

On the radar — tools production teams are using that we should evaluate. Research from 1,400+ company MCP deployments shows these are the most adopted by startups and agencies running 6-15 servers.

ToolCategoryRingWhy Teams Use ItOur Need
FirecrawlWeb ScrapingAssessStructured data extraction from any URL. 96% success rate. JavaScript rendering. Top scraping MCPETL pipeline — NZ business enrichment, competitor research
Exa SearchSemantic SearchAssessMeaning-based search, not keyword. Company research, code context, deep research modeIntelligence — find what Perplexity misses, semantic over keyword
TavilySearchAssessAI-optimised search API. Faster than Perplexity for simple lookups. Used alongside FirecrawlCompare against Perplexity — may be leaner for quick searches
Linear MCPProject MgmtAssessIssue tracking, sprint management. MCP lets agents create/update issues directlyAgent Project Mgmt PRD — issue tracking is top priority
Zapier MCPAutomationAssess8,000+ app integrations. Route findings, sync data, trigger notificationsWorkflow automation — connect tools without custom code
Slack MCPCommsAssessTeam notifications, research delivery, collaborative workflowsEngineering comms — alternative to Convex for notifications
Postgres MCP ProDatabaseAssessDirect Postgres queries via MCP. More mature than Supabase MCP for complex queriesEngineering — compare against Supabase MCP + Drizzle
MongoDB MCPDatabaseHoldNatural language to aggregation pipelines. Valuable for document storesNot needed — we're Postgres/Convex
Qdrant MCPVector DBHoldVector search for RAG, semantic memory, embeddingsFuture — when agent memory needs vector search

By Agent Team

Which tools does each team need loaded? Loading unused tools wastes tokens. This matrix drives .mcp.json configuration per project.

Production teams run 6-15 MCP servers per workspace. 81% of adopters are companies under 200 people — our size. The pattern: 2-3 core tools always loaded, 2-4 role-specific tools, everything else off.

Dream Team (orchestrator)

Jobs: PRD writing, strategy, priorities, commissioning, template improvement

Always LoadWhyTrial NextWhy
PerplexityResearch for PRDs, industry analysis, competitive scanningPencil.devWire diagrams for A&IDs, system architecture visualisation
GitHubPRD management, cross-repo coordination, PR creationExa SearchSemantic search finds connections Perplexity misses
Context7Accurate framework docs when specifying technical PRDs

Don't load: Supabase (no direct DB need), Indeed (no hiring workflow), Playwright (no testing).

Engineering (builder)

Jobs: Code, deploy, test, fix, schema management, API development

Always LoadWhyTrial NextWhy
GitHubPR management, code search, issue trackingPostgres MCP ProCompare against Supabase MCP for complex query patterns
SupabaseSchema inspection, data queries, ERD visualisationLinear MCPIf we adopt Linear for issue tracking (Agent Project Mgmt PRD)
Context7Prevents hallucinated APIs — saves rework cyclesVercelDeploy debugging — measure frequency before promoting

Don't load: Perplexity (use sparingly — research isn't the core job), Indeed, Pencil.dev (design team's tool).

Intelligence (research + data)

Jobs: Deep research, data acquisition, ETL pipelines, trust scoring, enrichment

Always LoadWhyTrial NextWhy
PerplexityThree search modes cover the research spectrumFirecrawlStructured extraction for NZ business enrichment pipeline
SupabaseData queries, pipeline inspection, trust score analysisExa SearchSemantic company research — deeper than keyword search
TavilyQuick lookups where Perplexity is overkill (save tokens)

Don't load: Vercel, Indeed, Pencil.dev, Context7 (not coding).

What crack teams do: Intelligence teams pair Firecrawl + Exa Search + database MCP as a three-tool stack. Firecrawl extracts structured data from URLs. Exa finds the URLs worth extracting. Database MCP stores and queries results. The agent orchestrates: discover → extract → store → analyse. This maps directly to our ETL pipeline (NZBN → Crawl4AI → trust scoring).

Marketing (growth)

Jobs: Content amplification, LinkedIn, landing pages, SEO, campaign analytics

Always LoadWhyTrial NextWhy
PerplexityResearch for content, competitor analysis, trend scanningPencil.devLanding page design directly in IDE — AI draws, code exports
Claude in ChromePage verification, UX testing, competitor site analysisFirecrawlExtract competitor landing page structure for analysis

Don't load: Supabase (no DB need), GitHub (not coding), Linear, Postgres.

What crack teams do: Marketing teams use Firecrawl to extract competitor landing pages into structured data (headlines, CTAs, social proof patterns), then feed that to the LLM for differentiation analysis. Our /landing-page skill could consume this.

Sales (CRM + outreach)

Jobs: Prospect research, RFP writing, pipeline management, deal qualification

Always LoadWhyTrial NextWhy
PerplexityProspect and company research before outreachFirecrawlExtract target company details from websites
SupabaseCRM data queries, deal pipeline, contact lookupExa SearchCompany intelligence — org charts, funding, tech stack

Don't load: GitHub, Vercel, Context7, Linear, Playwright.

What crack teams do: Sales intelligence teams configure Exa Search + Firecrawl + database MCP as a rapid account research stack. Agent receives "research this target account" → Exa finds company intelligence → Firecrawl extracts pricing/product details from their site → database stores the enriched profile. Our Sales Dev PRD targets exactly this workflow.

Commissioning (QA)

Jobs: Verify features against PRD, evidence capture, regression checking

Always LoadWhyTrial NextWhy
Claude in ChromeFull browser verification, GIF evidence, session reuseLinear MCPIf issue tracking moves to Linear — log findings directly
GitHubCheck PR status, verify deployments, read specsSupabaseVerify data integrity, check schema matches spec

Don't load: Perplexity (not researching), Indeed, Pencil.dev, Firecrawl.

Loading Rule

If a tool isn't in your Always Load or Trial column, don't load it. Every schema in context is tokens not spent on reasoning.

The team profiles above are starting positions. The governance protocol defines how tools move between columns based on measured value.

Decision Checklist

Run every MCP tool candidate through these gates before moving it to Trial or Adopt.

1. Job Fit

Does this tool solve a demanded skill?

  • Named job — Maps to a specific team's core job (not "might be useful someday")
  • No overlap — No existing tool already does this job adequately
  • Frequency — Used at least weekly, not once-a-quarter
  • Alternatives — Compared against non-MCP alternatives (CLI, API, manual)

2. Token Economics

Can you afford it at scale?

  • Schema cost — Tool definition adds less than 1K tokens to context
  • Per-call cost — Average response under 3K tokens
  • Session budget — Tool doesn't consume more than 15% of session tokens
  • Dynamic loading — Can be loaded on-demand rather than always-on

3. Reliability

Does it work when you need it?

  • Uptime — Available 99%+ during trial period
  • Error handling — Returns structured errors, not silent failures
  • Latency — Responds in under 5 seconds for typical queries
  • Determinism — Same input produces consistent output quality

4. Security

What are you exposing?

  • Credential scope — API keys have minimum necessary permissions
  • Data boundary — Tool can't access data outside its domain
  • Audit trail — Tool invocations are logged
  • Rotation — Credentials can be rotated without config rebuild

5. Composability

Does it play well with others?

  • MCP standard — Implements MCP protocol correctly (stdio or HTTP)
  • Schema quality — Tool descriptions are clear enough for the model to use correctly
  • No side effects — Read operations don't mutate state
  • Incremental adoption — Can add/remove without affecting other tools

6. Longevity

Will this exist in a year?

  • Active maintenance — Updated in the last 3 months
  • Community or vendor — Backed by a company or active open-source community
  • Standards alignment — Uses OAuth 2.1, MCP Server Cards, or equivalent
  • Migration path — If it dies, data/config isn't locked in

Tool Profiles

Perplexity

Ring: Adopt | Category: Search + Research

DimensionAssessment
What it doesWeb search with AI synthesis. Three modes: search (fast), reason (analytical), deep_research (thorough)
Token cost~2-5K per search query including response. Deep research can be 10K+
Why we use itReplaces manual browsing for research tasks. Returns synthesized answers with citations
RiskAPI key cost scales with usage. Deep research mode is expensive
AlternativeWebSearch tool (built-in), manual browsing

GitHub

Ring: Trial | Category: Code + Repos

DimensionAssessment
What it doesPR management, issue tracking, code search, file operations via GitHub API
Token cost~500-1K per call. PR reviews can be 3K+ with diffs
Why we use itPR creation, issue management, cross-repo code search
RiskOverlaps with gh CLI which is already available via Bash. Token overhead may not justify convenience
Alternativegh CLI (zero MCP overhead), direct API calls
Trial questionIs the MCP version measurably faster than gh CLI for our common tasks?

Supabase

Ring: Trial | Category: Database + Diagrams

DimensionAssessment
What it doesDirect database access — schema inspection, queries, data management. Also has a built-in schema visualizer for entity relationship diagrams
Token cost~300-800 per call. Schema introspection can be higher
Why we use itEngineering team inspects schemas, runs queries, manages data. Diagram interface shows table relationships visually — a ready-made wire diagram for data architecture
RiskHTTP endpoint exposes database. Permission model needs review
AlternativeDirect SQL via CLI, Drizzle ORM queries
BonusSchema diagrams are available to both dream team and engineering — shared visual of the data model without any extra tooling
Trial questionDoes MCP access add value over the existing Drizzle toolchain? Can the diagram interface replace custom ERD drawing?

Context7

Ring: Trial | Category: Documentation

DimensionAssessment
What it doesFetches up-to-date library documentation and code examples
Token cost~1-3K per query. Returns focused documentation snippets
Why we use itPrevents hallucinated API calls by grounding in actual docs
RiskLibrary coverage varies. May return outdated docs for niche libraries
AlternativeWebFetch to official docs, manual documentation reading
Trial questionHow often does Context7 prevent a hallucination that would have cost a rework cycle?

Claude in Chrome

Ring: Trial | Category: Browser Automation

See AI Browser Tools for full evaluation.

Vercel

Ring: Assess | Category: Deploy + Hosting

DimensionAssessment
What it doesDeploy management, build logs, runtime logs, project config
Token cost~500 per call
Why we use itDebug failed deployments, check preview URLs
RiskLow frequency use. Schema cost may exceed value for occasional deploy debugging
Trial questionHow many times per week do we actually need programmatic Vercel access?

Indeed

Ring: Assess | Category: Jobs + Hiring

DimensionAssessment
What it doesJob search, company data, resume management
Token cost~500 per call
RiskNo current workflow demands this. Loading it wastes context
DecisionRemove from default config until a hiring or competitive research workflow demands it

Pencil

Ring: Trial | Category: Design + Code

Pencil.dev — "Design on canvas. Land in code." AI-native vector design that runs inside your IDE.

DimensionAssessment
What it doesAI-powered vector design tool inside VS Code/Cursor. Agents interact with canvas via MCP server. Generates dashboards, landing pages, UI systems, component libraries
MCP nativeYes — MCP server gives AI agents full write access to canvas. Sticky notes on canvas become runnable prompts — canvas as agent workspace
File format.pen files = pure JSON. Human-readable, machine-readable, git-diffable. No binary blobs. Designs are code artifacts
Code outputHTML, CSS, React — pixel-perfect. Supports Shadcn UI, Lunaris, Halo, Nitro design systems
Figma bridgeCopy-paste from Figma preserving layouts and styles
Token costEstimated ~1-3K per canvas operation. Full canvas read could be higher (JSON structure)
Why it mattersSolves two problems: design-to-code AND visual system mapping. The Drawing Tool PRD thesis (git-native, own the visual language) fulfilled by an external tool instead of custom build
RiskNew tool — ecosystem maturity unknown. .pen format is proprietary JSON. IDE-only (no web version confirmed)

Trial plan:

#TestSuccess CriteriaDuration
1Install in VS Code, create first .pen fileWorking canvas in under 15 minutesDay 1
2Create agent-team-to-MCP-tools wire diagramDiagram communicates tool routing clearlyDay 1-2
3Test MCP server — can Claude Code draw on canvas?Agent creates or modifies a design via MCPWeek 1
4Create A&ID-style diagram (agents, instruments, feedback)Wire diagram quality matches or exceeds Mermaid/tldrawWeek 1-2
5Export to SVG/PNG, embed in docs pageRenders correctly in Docusaurus buildWeek 2
6Measure token cost per sessionMCP overhead within 15% budget targetWeek 2

Exit criteria: If A&ID wire diagrams take longer than Mermaid equivalent, or MCP server is unreliable, demote to Assess for UI-only use. If it delivers clear wire diagrams AND agent-driven design, promote to Adopt for the design team.

Playwright

Ring: Hold | Category: Browser Testing

DimensionAssessment
What it doesProgrammatic browser testing via Playwright
RiskAgent Browser achieves similar goals with better token efficiency. Claude in Chrome covers dev workflow
DecisionHold until Agent Browser proves insufficient for testing needs

On the Radar

Tools we're not using yet but production teams have validated. Profiles here to inform trial decisions.

Firecrawl

Ring: Assess | Category: Web Scraping + Data Extraction

DimensionAssessment
What it doesExtracts structured data from any URL. JavaScript rendering, anti-bot handling, 96% success rate across sites. Returns clean markdown or structured JSON
Why teams use itThe scraping tool in production MCP stacks. Intelligence teams pair it with semantic search for discover → extract → analyse workflows
Our needETL pipeline: NZ business enrichment (Crawl4AI does this locally, Firecrawl adds hosted reliability). Competitor landing page extraction for marketing
Token costMedium — structured extraction returns focused data, not raw HTML
RiskPaid API. 4% of high-value sites (LinkedIn, enterprise SaaS) remain inaccessible. We already have Crawl4AI in the ETL pipeline
Trial triggerWhen Crawl4AI reliability drops below 90% on target sites, or when we need hosted scraping for CI pipelines

Ring: Assess | Category: Semantic Search + Intelligence

DimensionAssessment
What it doesMeaning-based search (not keyword). Specialised tools: web search, code context, company research, people search, deep research mode
Why teams use itFinds what keyword search misses. "Companies doing X in Y market" returns relevant results even without exact phrase matches
Our needIntelligence — venture validation, competitor discovery, prospect research. Complements Perplexity which is keyword-biased
Token costMedium (~1-3K per query). Deep research mode is higher
RiskOverlap with Perplexity. Need to measure whether semantic search finds materially different results
Trial triggerWhen Perplexity consistently misses relevant companies or research during venture validation or sales intelligence

Tavily

Ring: Assess | Category: AI Search

DimensionAssessment
What it doesAI-optimised search API. Faster and cheaper than Perplexity for simple factual lookups
Why teams use itQuick context enrichment without the overhead of a full research tool. Pairs with Firecrawl in scraping workflows
Our needToken savings — if 60% of our Perplexity calls are simple lookups, Tavily could handle those at lower cost
Trial triggerWhen we have token consumption data showing Perplexity overuse on simple queries

Linear MCP

Ring: Assess | Category: Project Management

DimensionAssessment
What it doesIssue tracking, sprint management, project organisation. Agents create/update issues directly via MCP
Why teams use itMost adopted PM tool in MCP ecosystem. Notion agent connector. Natural language to structured issues
Our needAgent Project Mgmt PRD — issue tracking is the #1 priority. If we adopt Linear, the MCP server gives agents direct access
Trial triggerWhen Agent Project Mgmt PRD reaches L1 and we choose an issue tracking platform

Zapier MCP

Ring: Assess | Category: Workflow Automation

DimensionAssessment
What it does8,000+ app integrations via MCP. Agents trigger cross-app workflows without custom code
Why teams use itGlue layer — route research findings to teams, sync data between platforms, trigger notifications
Our needLow priority until we have more tools to connect. Current Convex comms system handles inter-agent messaging
Trial triggerWhen we need to connect 3+ external services that don't have dedicated MCP servers

Slack MCP

Ring: Assess | Category: Team Comms

DimensionAssessment
What it doesChannel management, messaging, thread creation. Agents deliver research findings and trigger team notifications
Our needAlternative or supplement to Convex for engineering comms. Agents could post commissioning results directly to Slack
Trial triggerWhen the team is actively using Slack and manual notification routing becomes friction

Token Economics

The Cost of Carrying Tools

Every MCP tool schema loaded into context consumes tokens before you've asked a single question.

Session token budget = context_window x 0.6  (reserve 40% for reasoning)
MCP overhead = sum(tool_schema_tokens) + sum(tool_result_tokens)
Efficiency = task_completion_rate / total_tokens_consumed

Target: MCP schema overhead < 15% of session budget

Measurement Protocol

Before promoting a tool from Assess to Trial:

  1. Baseline — Run 5 representative tasks WITHOUT the tool. Record tokens consumed
  2. With tool — Run same 5 tasks WITH the tool. Record tokens consumed
  3. Calculate — Delta tokens / tasks completed = cost per task
  4. Compare — Is the tool's cost justified by speed, quality, or capability gain?

Dynamic Loading

Research shows dynamic tool loading reduces token usage by 96% compared to loading all tools statically. The trade-off: 2-3x more tool calls (the model must discover tools before using them).

When to load statically (always-on): Tools used in >50% of sessions for that team. When to load dynamically (on-demand): Tools used occasionally or for specific task types.

Loading StrategyToken CostTool CallsBest For
Static (all tools loaded)High — every schema in contextFewer — model sees tools immediatelySmall toolsets (<5 tools)
Dynamic (load on demand)Low — only active tool schemasMore — discovery + useLarge toolsets (>10 tools)
Team profiles (curated sets)Medium — right tools per roleNormalOur approach — see team matrix above

Governance Protocol

DISCOVER &rarr; EVALUATE &rarr; TRIAL &rarr; ADOPT / HOLD &rarr; MONITOR &rarr; SHUFFLE

Discover

  • Check MCP Registry (~2,000 servers)
  • Check GitHub MCP Registry
  • Team requests — "I need a tool that does X"
  • Pain signals — repeated manual work that a tool could automate

Evaluate

Run the 6-gate decision checklist. Score each gate pass/fail. A tool needs 4/6 gates passing to enter Trial.

Trial

  • Duration: 2 weeks minimum
  • Scope: One team, real tasks (not toy examples)
  • Measure: Token cost, task completion rate, error rate, time saved
  • Exit criteria: Measurable improvement on at least one dimension

Adopt or Hold

  • Adopt: Proven value, acceptable cost, team wants to keep it
  • Hold: Didn't justify its token cost, unreliable, or superseded

Monitor

  • Monthly review: Is this tool still earning its context cost?
  • Token spend trending up without corresponding value? Demote to Trial
  • Better alternative emerged? Run comparative trial

Shuffle

  • Quarterly radar refresh
  • Remove tools that haven't been used in 30 days
  • Promote tools that consistently deliver value
  • Update team matrices when roles or jobs change

Decision Log

DateDecisionRingRationaleLearned
2026-03Perplexity MCP as primary research toolAdoptThree search modes cover research spectrum. Citations improve trustDeep research mode expensive — use search for quick lookups
2026-03GitHub MCP in trial alongside gh CLITrialNeed to measure whether MCP convenience justifies schema overhead vs CLI
2026-03Supabase MCP for engineering DB accessTrialHTTP endpoint is convenient but security model needs review
2026-03Context7 for library documentationTrialPrevents hallucinated APIs. Value depends on framework diversity
2026-03Pencil.dev for design + wire diagramsTrialMCP-native, .pen = JSON in git, agents draw on canvas. Trial for A&ID wire diagrams
2026-03Firecrawl, Exa, Tavily on radarAssessProduction teams validate these as the intelligence stack. Trial when ETL or sales pipeline demandsResearch: 1,400 MCP deployments show scraping + semantic search + DB as the winning combo
2026-03Linear MCP on radarAssessMost adopted PM tool in MCP ecosystem. Watch for Agent Project Mgmt PRD decision
2026-03Indeed MCP not loaded by defaultAssessNo current workflow demands it. Remove from active configDon't load tools without a named job
2026-03Playwright MCP on holdHoldAgent Browser + Claude in Chrome cover browser testing needs

Registries

RegistryPurposeURL
Official MCP RegistryCanonical server discovery (~2,000 entries)registry.modelcontextprotocol.io
GitHub MCP RegistryGitHub-integrated discoverygithub.blog
MKINFCommunity directoryhub.mkinf.io
MCP Server Cards.well-known discovery standardmodelcontextprotocol.io/development/roadmap

Drawing Tools

Wire diagrams make this system visible. See the Drawing Tool PRD for the long-term vision.

ToolBest ForIntegrationMCP NativeCost
Pencil.devAI-driven design, UI systems, agent workspace diagramsIDE (VS Code/Cursor), .pen = JSON, git-diffableYes — agents draw on canvas via MCPAssess
Supabase Schema VisualizerDatabase ERDs, table relationshipsBuilt into Supabase dashboard — both teams have accessYes (via Supabase MCP)Free
MermaidInline flow diagrams in docsNative Docusaurus plugin — renders in .md filesNoFree
tldrawRich A&ID diagrams, AI sketch-to-realExport SVG to /static/img/NoFree (OSS)
ExcalidrawQuick team sketches, whiteboardingExport SVG/PNGNoFree (OSS)

Pencil is the one to watch. It's MCP-native — agents can programmatically create and modify designs on canvas. If it handles wire diagrams (not just UI components), it could replace tldraw and Excalidraw for our use case. The .pen format being JSON means designs are code artifacts, not binary blobs — the exact thesis in the Drawing Tool PRD, without building a custom engine.

Start with what you have. Supabase diagrams show the data model. Mermaid shows the process flow. Trial Pencil for the A&ID-style agent/instrument wire diagrams.

The agent team matrix above is the first diagram that needs drawing — which tools flow to which teams for which jobs.

Context

Questions

Which tool in your Adopt ring would you demote if you measured its actual token cost per task?

  • If loading all tools costs 236x more tokens, what's the cost of the tool you loaded "just in case"?
  • When two tools overlap (GitHub MCP vs gh CLI), which one do you measure against — and what if the cheaper one is good enough?
  • What would your radar look like if you scored tools by tasks-completed-per-token instead of features-available?
  • Which team's "Don't Load" column reveals the most about what that team actually does?