Text-to-video, image-to-video, and video editing AI.
Capability Matrix
| Provider | Model | Max Length | Quality | Stage |
|---|
| OpenAI | Sora | ~1 min | Excellent | Limited access |
| Runway | Gen-3 | ~10 sec | Very Good | Available |
| Kling | Kling 1.6 | ~10 sec | Very Good | Available |
| Pika | 2.0 | ~5 sec | Good | Available |
| Luma | Dream Machine | ~5 sec | Good | Available |
| Minimax | Hailuo | ~6 sec | Good | Available |
Use Cases
| Application | Current Viability |
|---|
| Short social clips | Production-ready |
| Product demos | Production-ready |
| B-roll footage | Production-ready |
| Music videos | Emerging |
| Long-form content | Not yet viable |
| Need | 1st Choice | 2nd Choice | Why |
|---|
| Creative/narrative | Runway | Kling | Quality + control |
| Promo/ads | Arcads | Reel Farm | UGC-style conversion |
| Experimental/long | Sora | Veo3 | Emerging, watch quality |
Stack
| Layer | Tool |
|---|
| Model | Runway Gen-3, Kling 1.6, Sora (limited), Pika 2.0 |
| Framework | — (direct API; no framework integration yet) |
| MCP | — |
| CLI | — |
Video generation has no framework or MCP integration today. All tools require direct API or web UI. The image→video cell (Runway, Kling) is the most production-ready path. See Visual Prompting for technique.
Context
Questions
When AI video hits cinematic quality at 60 seconds — what happens to stock footage, B-roll houses, and junior videographers?
- Which video use case crosses from "production-ready" to "indistinguishable from human-shot" first?
- At what clip length does consistency across frames become the bottleneck rather than quality per frame?
- Is the real moat in video generation the model, or the prompting technique that gets usable output on the first try?