Tools

Which tool handles which modality?

The answer depends on the layer: what model generates the output, what framework orchestrates the pipeline, which MCP server connects it to external data, and which CLI drives it from the terminal.

Modality Stack Matrix

Modality	Model	Framework	MCP	CLI
Text	Claude, GPT-4	Claude Code, Cursor	Perplexity, Context7, GitHub	drmg, gh
Vision	Claude, GPT-4V, Gemini	Claude Code	Claude in Chrome	—
Voice	Whisper, ElevenLabs, Deepgram	Claude Code (pipeline)	—	—
Audio	Suno, Udio, ElevenLabs	—	—	—
Video	Runway, Kling, Sora	—	—	—
3D	Tripo, Meshy, Rodin	—	—	—
Code	Claude, GPT-4, Gemini	Claude Code, Codex	GitHub, Supabase, Context7	drmg, gh

Reading the matrix: Empty cells are gaps — no current tool or no integration exists for that combination. For the full tool list per modality (every provider, quality ratings, use cases), see AI Modalities. For the MCP server adoption radar, see MCP Tools.

Three Layers

Understanding the layer stops you comparing incompatible things.

Layer	What it is	Choose when
Model	Generates the output — text, image, speech, video, 3D	You're choosing what produces the capability
Framework	Orchestrates agents, tools, and pipelines	You're choosing how to coordinate and extend models
Protocol	Connects frameworks to external tools and data	You're choosing how agents and tools communicate

A production setup uses all three: Claude Code (framework) + Claude (model) + MCP servers (protocol).

Context

AI Modalities — What each modality can do and which models provide it
Models — LLM provider comparison (text modality depth)
Agent Protocols — MCP, A2A, Verifiable Intent at the protocol layer
Agents — When to use subagents and agent teams

Questions

Which cell in the modality matrix above represents the highest-value gap for your current workflow?

If you had to wire one new modality into your agent stack this week, which row would deliver the most value — and what's blocking the MCP column from filling in?
When the model IS the matrix (omnimodal, any-in to any-out), which layer becomes redundant — framework, protocol, or CLI?
How do you decide when a direct API call beats an MCP server for the same capability?

Tools

Modality Stack Matrix

Three Layers

What's Here

MCP Servers

MCP Tools

CLI Tools

Apps

Context

Questions

Modality Stack Matrix​

Three Layers​

What's Here​

MCP Servers​

MCP Tools​

CLI Tools​

Apps​

Context​

Questions​

Modality Stack Matrix

Three Layers

What's Here

MCP Servers

MCP Tools

CLI Tools

Apps

Context

Questions