Text-to-video, image-to-video, and video editing AI.
Capability Matrix
| Provider | Model | Max Length | Quality | Stage |
|---|
| OpenAI | Sora | ~1 min | Excellent | Limited access |
| Runway | Gen-3 | ~10 sec | Very Good | Available |
| Kling | Kling 1.6 | ~10 sec | Very Good | Available |
| Pika | 2.0 | ~5 sec | Good | Available |
| Luma | Dream Machine | ~5 sec | Good | Available |
| Minimax | Hailuo | ~6 sec | Good | Available |
Use Cases
| Application | Current Viability |
|---|
| Short social clips | Production-ready |
| Product demos | Production-ready |
| B-roll footage | Production-ready |
| Music videos | Emerging |
| Long-form content | Not yet viable |
| Need | 1st Choice | 2nd Choice | Why |
|---|
| Creative/narrative | Runway | Kling | Quality + control |
| Promo/ads | Arcads | Reel Farm | UGC-style conversion |
| Experimental/long | Sora | Veo3 | Emerging, watch quality |
Context
Questions
When AI video hits cinematic quality at 60 seconds — what happens to stock footage, B-roll houses, and junior videographers?
- Which video use case crosses from "production-ready" to "indistinguishable from human-shot" first?
- At what clip length does consistency across frames become the bottleneck rather than quality per frame?
- Is the real moat in video generation the model, or the prompting technique that gets usable output on the first try?