AI MCP Tools
Which tools should your agents carry — and what does each one cost to hold?
MCP tools expand what agents can do. They also expand what agents consume. Research shows MCP tool definitions inflate input tokens by 3x to 236x depending on the toolset. Loading everything is prohibitively expensive. Loading nothing is crippling. The protocol below finds the balance — maximum effectiveness at optimum efficiency.
Adoption Radar
Inspired by ThoughtWorks Tech Radar. Status reflects current assessment for AI agent workflows.
| Ring | Meaning |
|---|---|
| Adopt | Use in production workflows. Proven, low risk. |
| Trial | Use on real projects with eyes open. Active testing. |
| Assess | Explore. Understand the trade-offs. Don't depend on it yet. |
| Hold | Wait. Immature, unstable, or superseded. |
Current Radar (March 2026)
In use — tools we currently have configured and are actively using or testing.
| Tool | Category | Ring | Trajectory | Token Cost | Rationale |
|---|---|---|---|---|---|
| Perplexity MCP | Search + Research | Adopt | Stable | Medium (~2-5K/query) | Primary research tool. Three modes: search, reason, deep_research. Replaces manual web search |
| GitHub MCP | Code + Repos | Trial | Rising | Low (~500-1K/call) | PR management, issue tracking, code search. Overlaps with gh CLI — evaluate which is leaner |
| Supabase MCP | Database + Diagrams | Trial | Rising | Low (~300-800/call) | Direct DB access + built-in schema visualizer. Both teams have access |
| Context7 | Documentation | Trial | Rising | Medium (~1-3K/query) | Library docs lookup. Prevents hallucinated API calls. High value for unfamiliar frameworks |
| Claude in Chrome | Browser | Trial | Stable | High (~10-20K/read) | Full browser control. Screenshots expensive. See Browser Tools |
| Pencil.dev | Design + Code | Trial | Rising | Medium (~1-3K/call) | AI-native design-to-code. MCP server lets agents draw on canvas. .pen = JSON = git-diffable |
| Vercel MCP | Deploy + Hosting | Assess | Stable | Low (~500/call) | Deploy previews, logs. Useful when debugging deploys |
| Indeed MCP | Jobs + Hiring | Assess | Unknown | Low (~500/call) | Job search, company data. No current workflow demands it |
| Playwright MCP | Browser Testing | Hold | — | Medium | Agent Browser + Claude in Chrome cover this |
On the radar — tools production teams are using that we should evaluate. Research from 1,400+ company MCP deployments shows these are the most adopted by startups and agencies running 6-15 servers.
| Tool | Category | Ring | Why Teams Use It | Our Need |
|---|---|---|---|---|
| Firecrawl | Web Scraping | Assess | Structured data extraction from any URL. 96% success rate. JavaScript rendering. Top scraping MCP | ETL pipeline — NZ business enrichment, competitor research |
| Exa Search | Semantic Search | Assess | Meaning-based search, not keyword. Company research, code context, deep research mode | Intelligence — find what Perplexity misses, semantic over keyword |
| Tavily | Search | Assess | AI-optimised search API. Faster than Perplexity for simple lookups. Used alongside Firecrawl | Compare against Perplexity — may be leaner for quick searches |
| Linear MCP | Project Mgmt | Assess | Issue tracking, sprint management. MCP lets agents create/update issues directly | Agent Project Mgmt PRD — issue tracking is top priority |
| Zapier MCP | Automation | Assess | 8,000+ app integrations. Route findings, sync data, trigger notifications | Workflow automation — connect tools without custom code |
| Slack MCP | Comms | Assess | Team notifications, research delivery, collaborative workflows | Engineering comms — alternative to Convex for notifications |
| Postgres MCP Pro | Database | Assess | Direct Postgres queries via MCP. More mature than Supabase MCP for complex queries | Engineering — compare against Supabase MCP + Drizzle |
| MongoDB MCP | Database | Hold | Natural language to aggregation pipelines. Valuable for document stores | Not needed — we're Postgres/Convex |
| Qdrant MCP | Vector DB | Hold | Vector search for RAG, semantic memory, embeddings | Future — when agent memory needs vector search |
By Agent Team
Which tools does each team need loaded? Loading unused tools wastes tokens. This matrix drives .mcp.json configuration per project.
Production teams run 6-15 MCP servers per workspace. 81% of adopters are companies under 200 people — our size. The pattern: 2-3 core tools always loaded, 2-4 role-specific tools, everything else off.
Dream Team (orchestrator)
Jobs: PRD writing, strategy, priorities, commissioning, template improvement
| Always Load | Why | Trial Next | Why |
|---|---|---|---|
| Perplexity | Research for PRDs, industry analysis, competitive scanning | Pencil.dev | Wire diagrams for A&IDs, system architecture visualisation |
| GitHub | PRD management, cross-repo coordination, PR creation | Exa Search | Semantic search finds connections Perplexity misses |
| Context7 | Accurate framework docs when specifying technical PRDs | — | — |
Don't load: Supabase (no direct DB need), Indeed (no hiring workflow), Playwright (no testing).
Engineering (builder)
Jobs: Code, deploy, test, fix, schema management, API development
| Always Load | Why | Trial Next | Why |
|---|---|---|---|
| GitHub | PR management, code search, issue tracking | Postgres MCP Pro | Compare against Supabase MCP for complex query patterns |
| Supabase | Schema inspection, data queries, ERD visualisation | Linear MCP | If we adopt Linear for issue tracking (Agent Project Mgmt PRD) |
| Context7 | Prevents hallucinated APIs — saves rework cycles | Vercel | Deploy debugging — measure frequency before promoting |
Don't load: Perplexity (use sparingly — research isn't the core job), Indeed, Pencil.dev (design team's tool).
Intelligence (research + data)
Jobs: Deep research, data acquisition, ETL pipelines, trust scoring, enrichment
| Always Load | Why | Trial Next | Why |
|---|---|---|---|
| Perplexity | Three search modes cover the research spectrum | Firecrawl | Structured extraction for NZ business enrichment pipeline |
| Supabase | Data queries, pipeline inspection, trust score analysis | Exa Search | Semantic company research — deeper than keyword search |
| — | — | Tavily | Quick lookups where Perplexity is overkill (save tokens) |
Don't load: Vercel, Indeed, Pencil.dev, Context7 (not coding).
What crack teams do: Intelligence teams pair Firecrawl + Exa Search + database MCP as a three-tool stack. Firecrawl extracts structured data from URLs. Exa finds the URLs worth extracting. Database MCP stores and queries results. The agent orchestrates: discover → extract → store → analyse. This maps directly to our ETL pipeline (NZBN → Crawl4AI → trust scoring).
Marketing (growth)
Jobs: Content amplification, LinkedIn, landing pages, SEO, campaign analytics
| Always Load | Why | Trial Next | Why |
|---|---|---|---|
| Perplexity | Research for content, competitor analysis, trend scanning | Pencil.dev | Landing page design directly in IDE — AI draws, code exports |
| Claude in Chrome | Page verification, UX testing, competitor site analysis | Firecrawl | Extract competitor landing page structure for analysis |
Don't load: Supabase (no DB need), GitHub (not coding), Linear, Postgres.
What crack teams do: Marketing teams use Firecrawl to extract competitor landing pages into structured data (headlines, CTAs, social proof patterns), then feed that to the LLM for differentiation analysis. Our /landing-page skill could consume this.
Sales (CRM + outreach)
Jobs: Prospect research, RFP writing, pipeline management, deal qualification
| Always Load | Why | Trial Next | Why |
|---|---|---|---|
| Perplexity | Prospect and company research before outreach | Firecrawl | Extract target company details from websites |
| Supabase | CRM data queries, deal pipeline, contact lookup | Exa Search | Company intelligence — org charts, funding, tech stack |
Don't load: GitHub, Vercel, Context7, Linear, Playwright.
What crack teams do: Sales intelligence teams configure Exa Search + Firecrawl + database MCP as a rapid account research stack. Agent receives "research this target account" → Exa finds company intelligence → Firecrawl extracts pricing/product details from their site → database stores the enriched profile. Our Sales Dev PRD targets exactly this workflow.
Commissioning (QA)
Jobs: Verify features against PRD, evidence capture, regression checking
| Always Load | Why | Trial Next | Why |
|---|---|---|---|
| Claude in Chrome | Full browser verification, GIF evidence, session reuse | Linear MCP | If issue tracking moves to Linear — log findings directly |
| GitHub | Check PR status, verify deployments, read specs | Supabase | Verify data integrity, check schema matches spec |
Don't load: Perplexity (not researching), Indeed, Pencil.dev, Firecrawl.
Loading Rule
If a tool isn't in your Always Load or Trial column, don't load it. Every schema in context is tokens not spent on reasoning.
The team profiles above are starting positions. The governance protocol defines how tools move between columns based on measured value.
Decision Checklist
Run every MCP tool candidate through these gates before moving it to Trial or Adopt.
1. Job Fit
Does this tool solve a demanded skill?
- Named job — Maps to a specific team's core job (not "might be useful someday")
- No overlap — No existing tool already does this job adequately
- Frequency — Used at least weekly, not once-a-quarter
- Alternatives — Compared against non-MCP alternatives (CLI, API, manual)
2. Token Economics
Can you afford it at scale?
- Schema cost — Tool definition adds less than 1K tokens to context
- Per-call cost — Average response under 3K tokens
- Session budget — Tool doesn't consume more than 15% of session tokens
- Dynamic loading — Can be loaded on-demand rather than always-on
3. Reliability
Does it work when you need it?
- Uptime — Available 99%+ during trial period
- Error handling — Returns structured errors, not silent failures
- Latency — Responds in under 5 seconds for typical queries
- Determinism — Same input produces consistent output quality
4. Security
What are you exposing?
- Credential scope — API keys have minimum necessary permissions
- Data boundary — Tool can't access data outside its domain
- Audit trail — Tool invocations are logged
- Rotation — Credentials can be rotated without config rebuild
5. Composability
Does it play well with others?
- MCP standard — Implements MCP protocol correctly (stdio or HTTP)
- Schema quality — Tool descriptions are clear enough for the model to use correctly
- No side effects — Read operations don't mutate state
- Incremental adoption — Can add/remove without affecting other tools
6. Longevity
Will this exist in a year?
- Active maintenance — Updated in the last 3 months
- Community or vendor — Backed by a company or active open-source community
- Standards alignment — Uses OAuth 2.1, MCP Server Cards, or equivalent
- Migration path — If it dies, data/config isn't locked in
Tool Profiles
Perplexity
Ring: Adopt | Category: Search + Research
| Dimension | Assessment |
|---|---|
| What it does | Web search with AI synthesis. Three modes: search (fast), reason (analytical), deep_research (thorough) |
| Token cost | ~2-5K per search query including response. Deep research can be 10K+ |
| Why we use it | Replaces manual browsing for research tasks. Returns synthesized answers with citations |
| Risk | API key cost scales with usage. Deep research mode is expensive |
| Alternative | WebSearch tool (built-in), manual browsing |
GitHub
Ring: Trial | Category: Code + Repos
| Dimension | Assessment |
|---|---|
| What it does | PR management, issue tracking, code search, file operations via GitHub API |
| Token cost | ~500-1K per call. PR reviews can be 3K+ with diffs |
| Why we use it | PR creation, issue management, cross-repo code search |
| Risk | Overlaps with gh CLI which is already available via Bash. Token overhead may not justify convenience |
| Alternative | gh CLI (zero MCP overhead), direct API calls |
| Trial question | Is the MCP version measurably faster than gh CLI for our common tasks? |
Supabase
Ring: Trial | Category: Database + Diagrams
| Dimension | Assessment |
|---|---|
| What it does | Direct database access — schema inspection, queries, data management. Also has a built-in schema visualizer for entity relationship diagrams |
| Token cost | ~300-800 per call. Schema introspection can be higher |
| Why we use it | Engineering team inspects schemas, runs queries, manages data. Diagram interface shows table relationships visually — a ready-made wire diagram for data architecture |
| Risk | HTTP endpoint exposes database. Permission model needs review |
| Alternative | Direct SQL via CLI, Drizzle ORM queries |
| Bonus | Schema diagrams are available to both dream team and engineering — shared visual of the data model without any extra tooling |
| Trial question | Does MCP access add value over the existing Drizzle toolchain? Can the diagram interface replace custom ERD drawing? |
Context7
Ring: Trial | Category: Documentation
| Dimension | Assessment |
|---|---|
| What it does | Fetches up-to-date library documentation and code examples |
| Token cost | ~1-3K per query. Returns focused documentation snippets |
| Why we use it | Prevents hallucinated API calls by grounding in actual docs |
| Risk | Library coverage varies. May return outdated docs for niche libraries |
| Alternative | WebFetch to official docs, manual documentation reading |
| Trial question | How often does Context7 prevent a hallucination that would have cost a rework cycle? |
Claude in Chrome
Ring: Trial | Category: Browser Automation
See AI Browser Tools for full evaluation.
Vercel
Ring: Assess | Category: Deploy + Hosting
| Dimension | Assessment |
|---|---|
| What it does | Deploy management, build logs, runtime logs, project config |
| Token cost | ~500 per call |
| Why we use it | Debug failed deployments, check preview URLs |
| Risk | Low frequency use. Schema cost may exceed value for occasional deploy debugging |
| Trial question | How many times per week do we actually need programmatic Vercel access? |
Indeed
Ring: Assess | Category: Jobs + Hiring
| Dimension | Assessment |
|---|---|
| What it does | Job search, company data, resume management |
| Token cost | ~500 per call |
| Risk | No current workflow demands this. Loading it wastes context |
| Decision | Remove from default config until a hiring or competitive research workflow demands it |
Pencil
Ring: Trial | Category: Design + Code
Pencil.dev — "Design on canvas. Land in code." AI-native vector design that runs inside your IDE.
| Dimension | Assessment |
|---|---|
| What it does | AI-powered vector design tool inside VS Code/Cursor. Agents interact with canvas via MCP server. Generates dashboards, landing pages, UI systems, component libraries |
| MCP native | Yes — MCP server gives AI agents full write access to canvas. Sticky notes on canvas become runnable prompts — canvas as agent workspace |
| File format | .pen files = pure JSON. Human-readable, machine-readable, git-diffable. No binary blobs. Designs are code artifacts |
| Code output | HTML, CSS, React — pixel-perfect. Supports Shadcn UI, Lunaris, Halo, Nitro design systems |
| Figma bridge | Copy-paste from Figma preserving layouts and styles |
| Token cost | Estimated ~1-3K per canvas operation. Full canvas read could be higher (JSON structure) |
| Why it matters | Solves two problems: design-to-code AND visual system mapping. The Drawing Tool PRD thesis (git-native, own the visual language) fulfilled by an external tool instead of custom build |
| Risk | New tool — ecosystem maturity unknown. .pen format is proprietary JSON. IDE-only (no web version confirmed) |
Trial plan:
| # | Test | Success Criteria | Duration |
|---|---|---|---|
| 1 | Install in VS Code, create first .pen file | Working canvas in under 15 minutes | Day 1 |
| 2 | Create agent-team-to-MCP-tools wire diagram | Diagram communicates tool routing clearly | Day 1-2 |
| 3 | Test MCP server — can Claude Code draw on canvas? | Agent creates or modifies a design via MCP | Week 1 |
| 4 | Create A&ID-style diagram (agents, instruments, feedback) | Wire diagram quality matches or exceeds Mermaid/tldraw | Week 1-2 |
| 5 | Export to SVG/PNG, embed in docs page | Renders correctly in Docusaurus build | Week 2 |
| 6 | Measure token cost per session | MCP overhead within 15% budget target | Week 2 |
Exit criteria: If A&ID wire diagrams take longer than Mermaid equivalent, or MCP server is unreliable, demote to Assess for UI-only use. If it delivers clear wire diagrams AND agent-driven design, promote to Adopt for the design team.
Playwright
Ring: Hold | Category: Browser Testing
| Dimension | Assessment |
|---|---|
| What it does | Programmatic browser testing via Playwright |
| Risk | Agent Browser achieves similar goals with better token efficiency. Claude in Chrome covers dev workflow |
| Decision | Hold until Agent Browser proves insufficient for testing needs |
On the Radar
Tools we're not using yet but production teams have validated. Profiles here to inform trial decisions.
Firecrawl
Ring: Assess | Category: Web Scraping + Data Extraction
| Dimension | Assessment |
|---|---|
| What it does | Extracts structured data from any URL. JavaScript rendering, anti-bot handling, 96% success rate across sites. Returns clean markdown or structured JSON |
| Why teams use it | The scraping tool in production MCP stacks. Intelligence teams pair it with semantic search for discover → extract → analyse workflows |
| Our need | ETL pipeline: NZ business enrichment (Crawl4AI does this locally, Firecrawl adds hosted reliability). Competitor landing page extraction for marketing |
| Token cost | Medium — structured extraction returns focused data, not raw HTML |
| Risk | Paid API. 4% of high-value sites (LinkedIn, enterprise SaaS) remain inaccessible. We already have Crawl4AI in the ETL pipeline |
| Trial trigger | When Crawl4AI reliability drops below 90% on target sites, or when we need hosted scraping for CI pipelines |
Exa Search
Ring: Assess | Category: Semantic Search + Intelligence
| Dimension | Assessment |
|---|---|
| What it does | Meaning-based search (not keyword). Specialised tools: web search, code context, company research, people search, deep research mode |
| Why teams use it | Finds what keyword search misses. "Companies doing X in Y market" returns relevant results even without exact phrase matches |
| Our need | Intelligence — venture validation, competitor discovery, prospect research. Complements Perplexity which is keyword-biased |
| Token cost | Medium (~1-3K per query). Deep research mode is higher |
| Risk | Overlap with Perplexity. Need to measure whether semantic search finds materially different results |
| Trial trigger | When Perplexity consistently misses relevant companies or research during venture validation or sales intelligence |
Tavily
Ring: Assess | Category: AI Search
| Dimension | Assessment |
|---|---|
| What it does | AI-optimised search API. Faster and cheaper than Perplexity for simple factual lookups |
| Why teams use it | Quick context enrichment without the overhead of a full research tool. Pairs with Firecrawl in scraping workflows |
| Our need | Token savings — if 60% of our Perplexity calls are simple lookups, Tavily could handle those at lower cost |
| Trial trigger | When we have token consumption data showing Perplexity overuse on simple queries |
Linear MCP
Ring: Assess | Category: Project Management
| Dimension | Assessment |
|---|---|
| What it does | Issue tracking, sprint management, project organisation. Agents create/update issues directly via MCP |
| Why teams use it | Most adopted PM tool in MCP ecosystem. Notion agent connector. Natural language to structured issues |
| Our need | Agent Project Mgmt PRD — issue tracking is the #1 priority. If we adopt Linear, the MCP server gives agents direct access |
| Trial trigger | When Agent Project Mgmt PRD reaches L1 and we choose an issue tracking platform |
Zapier MCP
Ring: Assess | Category: Workflow Automation
| Dimension | Assessment |
|---|---|
| What it does | 8,000+ app integrations via MCP. Agents trigger cross-app workflows without custom code |
| Why teams use it | Glue layer — route research findings to teams, sync data between platforms, trigger notifications |
| Our need | Low priority until we have more tools to connect. Current Convex comms system handles inter-agent messaging |
| Trial trigger | When we need to connect 3+ external services that don't have dedicated MCP servers |
Slack MCP
Ring: Assess | Category: Team Comms
| Dimension | Assessment |
|---|---|
| What it does | Channel management, messaging, thread creation. Agents deliver research findings and trigger team notifications |
| Our need | Alternative or supplement to Convex for engineering comms. Agents could post commissioning results directly to Slack |
| Trial trigger | When the team is actively using Slack and manual notification routing becomes friction |
Token Economics
The Cost of Carrying Tools
Every MCP tool schema loaded into context consumes tokens before you've asked a single question.
Session token budget = context_window x 0.6 (reserve 40% for reasoning)
MCP overhead = sum(tool_schema_tokens) + sum(tool_result_tokens)
Efficiency = task_completion_rate / total_tokens_consumed
Target: MCP schema overhead < 15% of session budget
Measurement Protocol
Before promoting a tool from Assess to Trial:
- Baseline — Run 5 representative tasks WITHOUT the tool. Record tokens consumed
- With tool — Run same 5 tasks WITH the tool. Record tokens consumed
- Calculate — Delta tokens / tasks completed = cost per task
- Compare — Is the tool's cost justified by speed, quality, or capability gain?
Dynamic Loading
Research shows dynamic tool loading reduces token usage by 96% compared to loading all tools statically. The trade-off: 2-3x more tool calls (the model must discover tools before using them).
When to load statically (always-on): Tools used in >50% of sessions for that team. When to load dynamically (on-demand): Tools used occasionally or for specific task types.
| Loading Strategy | Token Cost | Tool Calls | Best For |
|---|---|---|---|
| Static (all tools loaded) | High — every schema in context | Fewer — model sees tools immediately | Small toolsets (<5 tools) |
| Dynamic (load on demand) | Low — only active tool schemas | More — discovery + use | Large toolsets (>10 tools) |
| Team profiles (curated sets) | Medium — right tools per role | Normal | Our approach — see team matrix above |
Governance Protocol
DISCOVER → EVALUATE → TRIAL → ADOPT / HOLD → MONITOR → SHUFFLE
Discover
- Check MCP Registry (~2,000 servers)
- Check GitHub MCP Registry
- Team requests — "I need a tool that does X"
- Pain signals — repeated manual work that a tool could automate
Evaluate
Run the 6-gate decision checklist. Score each gate pass/fail. A tool needs 4/6 gates passing to enter Trial.
Trial
- Duration: 2 weeks minimum
- Scope: One team, real tasks (not toy examples)
- Measure: Token cost, task completion rate, error rate, time saved
- Exit criteria: Measurable improvement on at least one dimension
Adopt or Hold
- Adopt: Proven value, acceptable cost, team wants to keep it
- Hold: Didn't justify its token cost, unreliable, or superseded
Monitor
- Monthly review: Is this tool still earning its context cost?
- Token spend trending up without corresponding value? Demote to Trial
- Better alternative emerged? Run comparative trial
Shuffle
- Quarterly radar refresh
- Remove tools that haven't been used in 30 days
- Promote tools that consistently deliver value
- Update team matrices when roles or jobs change
Decision Log
| Date | Decision | Ring | Rationale | Learned |
|---|---|---|---|---|
| 2026-03 | Perplexity MCP as primary research tool | Adopt | Three search modes cover research spectrum. Citations improve trust | Deep research mode expensive — use search for quick lookups |
| 2026-03 | GitHub MCP in trial alongside gh CLI | Trial | Need to measure whether MCP convenience justifies schema overhead vs CLI | — |
| 2026-03 | Supabase MCP for engineering DB access | Trial | HTTP endpoint is convenient but security model needs review | — |
| 2026-03 | Context7 for library documentation | Trial | Prevents hallucinated APIs. Value depends on framework diversity | — |
| 2026-03 | Pencil.dev for design + wire diagrams | Trial | MCP-native, .pen = JSON in git, agents draw on canvas. Trial for A&ID wire diagrams | — |
| 2026-03 | Firecrawl, Exa, Tavily on radar | Assess | Production teams validate these as the intelligence stack. Trial when ETL or sales pipeline demands | Research: 1,400 MCP deployments show scraping + semantic search + DB as the winning combo |
| 2026-03 | Linear MCP on radar | Assess | Most adopted PM tool in MCP ecosystem. Watch for Agent Project Mgmt PRD decision | — |
| 2026-03 | Indeed MCP not loaded by default | Assess | No current workflow demands it. Remove from active config | Don't load tools without a named job |
| 2026-03 | Playwright MCP on hold | Hold | Agent Browser + Claude in Chrome cover browser testing needs | — |
Registries
| Registry | Purpose | URL |
|---|---|---|
| Official MCP Registry | Canonical server discovery (~2,000 entries) | registry.modelcontextprotocol.io |
| GitHub MCP Registry | GitHub-integrated discovery | github.blog |
| MKINF | Community directory | hub.mkinf.io |
| MCP Server Cards | .well-known discovery standard | modelcontextprotocol.io/development/roadmap |
Drawing Tools
Wire diagrams make this system visible. See the Drawing Tool PRD for the long-term vision.
| Tool | Best For | Integration | MCP Native | Cost |
|---|---|---|---|---|
| Pencil.dev | AI-driven design, UI systems, agent workspace diagrams | IDE (VS Code/Cursor), .pen = JSON, git-diffable | Yes — agents draw on canvas via MCP | Assess |
| Supabase Schema Visualizer | Database ERDs, table relationships | Built into Supabase dashboard — both teams have access | Yes (via Supabase MCP) | Free |
| Mermaid | Inline flow diagrams in docs | Native Docusaurus plugin — renders in .md files | No | Free |
| tldraw | Rich A&ID diagrams, AI sketch-to-real | Export SVG to /static/img/ | No | Free (OSS) |
| Excalidraw | Quick team sketches, whiteboarding | Export SVG/PNG | No | Free (OSS) |
Pencil is the one to watch. It's MCP-native — agents can programmatically create and modify designs on canvas. If it handles wire diagrams (not just UI components), it could replace tldraw and Excalidraw for our use case. The .pen format being JSON means designs are code artifacts, not binary blobs — the exact thesis in the Drawing Tool PRD, without building a custom engine.
Start with what you have. Supabase diagrams show the data model. Mermaid shows the process flow. Trial Pencil for the A&ID-style agent/instrument wire diagrams.
The agent team matrix above is the first diagram that needs drawing — which tools flow to which teams for which jobs.
Context
- MCP Protocol — Protocol specification and server list
- AI Browser Tools — Browser tool radar (same pattern)
- AI Coding Config — Agent configuration standards
- Agent Platform — Agent identity, memory, dispatch
- Matrix Thinking — The grid that reveals gaps
- Drawing Tool PRD — Visual language for system diagrams
- Token Optimization Research — MCP token consumption benchmarks
Links
- MCP Registry — Official server discovery
- GitHub MCP Registry — GitHub-hosted discovery
- Dynamic Toolsets — 96% token reduction approach
- MCP Token Research — Academic benchmarks on MCP token inflation
- ThoughtWorks Tech Radar — The radar pattern we follow
- We Analyzed 1,400 MCP Servers — 81% from companies under 200 people, remote servers 4x growth
- MCP Adoption Statistics — 232% growth, 80% of top servers offer remote deployment
- State of Postgres MCP Servers — Security vulnerabilities and best practices
- Best MCP Servers for Cursor — Production setup patterns
- Firecrawl — Structured web scraping via MCP
- Exa Search — Semantic search with company intelligence
- Tavily — AI-optimised search API
- Pencil.dev — AI-native design-to-code, MCP server, IDE-integrated
- Pencil MCP Integration — How agents interact with the canvas
- tldraw Make Real — AI-assisted diagram creation
- Docusaurus Mermaid — Native diagram support
- awesome-mcp-servers — Community curated list
Questions
Which tool in your Adopt ring would you demote if you measured its actual token cost per task?
- If loading all tools costs 236x more tokens, what's the cost of the tool you loaded "just in case"?
- When two tools overlap (GitHub MCP vs
ghCLI), which one do you measure against — and what if the cheaper one is good enough? - What would your radar look like if you scored tools by tasks-completed-per-token instead of features-available?
- Which team's "Don't Load" column reveals the most about what that team actually does?