Skip to main content

Audio & Music Generation

Text-to-music, sound effects, and audio production AI.

Capability Matrix

Provider	Model	Output Type	Quality	Open Source
Suno	v4	Full songs	Very Good	No
Udio	—	Full songs	Very Good	No
Stability	Stable Audio	Music/SFX	Good	Partial
ElevenLabs	Sound Effects	SFX	Very Good	No
Meta	AudioCraft	Music	Good	Yes

Use Cases

Application	Best For
Background music	Suno, Udio
Sound effects	ElevenLabs, Stable Audio
Jingles/podcasts	Suno
Game audio	Stable Audio, AudioCraft

Tool Selection

Need	1st Choice	2nd Choice	Why
Music generation	Suno	Udio	Suno for pop, Udio for experimental
Sound effects	ElevenLabs	Stable Audio	Quality + variety
Stem separation	Demucs	—	Open source, reliable

Stack

Layer	Tool
Model	Suno, Udio (music); ElevenLabs, Stability (SFX); Demucs (stem separation)
Framework	— (direct API; no framework integration yet)
MCP	—
CLI	—

Audio generation tools operate via direct API or web UI. No framework or MCP integration exists for this modality today — the gap in the modality matrix is real.

Context

AI Modalities — All capability types
AI Tools — Where the framework and MCP gaps are
Voice — Speech synthesis

Questions

When AI can generate a full song from a text prompt — what's left that requires a musician?

Does genre matter when the generation model can blend any two styles on demand?
Which audio use case has the shortest path to replacing the human entirely — background music, sound effects, or jingles?
What happens to music licensing when generation is cheaper than licensing?

Capability Matrix
Use Cases
Tool Selection
Stack
Context
Questions