Video Prompts
Direct the shot, not the story.
Cinematographic language transfers directly. Shot type, camera motion, lighting, pacing — these are the prompting primitives. The models that respond best are the ones trained on the same visual grammar filmmakers already speak.
The Template
| Element | What It Controls | Example |
|---|---|---|
| Shot type | Framing | "Medium close-up" |
| Camera motion | Movement | "Slow dolly forward" |
| Subject action | What happens | "Woman turns to face camera" |
| Lighting | Mood and depth | "Rim light from behind, deep shadows" |
| Duration | Clip length | "4 seconds" |
| Style | Visual treatment | "35mm film grain, desaturated" |
| Transition | How clips connect | "Match cut on hand gesture" |
Techniques
Cinematographic Language
Use the vocabulary directors use. Models trained on film metadata respond to it.
"Extreme wide shot. Desert landscape, single figure walking away from camera. Heat haze distortion. Locked tripod. 6 seconds."
Temporal Direction
Describe what changes over time, not a frozen moment.
"Start tight on hands typing. Slowly zoom out to reveal the room is empty except for the desk. Natural light shifts from warm to blue as clouds pass. 8 seconds."
Consistency Across Clips
The hardest problem in AI video. Anchor with specifics.
"Same character throughout: East Asian woman, mid-30s, short black hair, navy coat. Maintain consistent lighting — overcast daylight, no direct sun. Same colour grade — muted teal and orange."
Shot Sequencing
Think in montages, not individual clips.
"Shot 1: Close-up of coffee being poured, steam rising. Shot 2: Wide shot of empty cafe, morning light. Shot 3: Medium shot of barista wiping counter, slow motion. All shots: warm analog palette, shallow depth of field."
The Prompt
A complete 5-shot sequence prompt for a brand film. Uses cinematographic language, temporal direction, and consistency anchors across clips.
You are directing a 30-second brand film for a premium coffee company.
Every shot must feel intentional — this is cinema, not stock footage.
CONSISTENCY ANCHORS (maintain across ALL shots):
- Subject: East Asian woman, mid-30s, short black hair, navy wool coat
- Color grade: Muted teal shadows, warm amber highlights. Desaturated.
- Lighting: Overcast natural daylight. No direct sun. Soft and diffused.
- Film stock: 35mm grain, slight vignette. NOT clean digital.
- Aspect ratio: 2.39:1 (cinematic widescreen)
SHOT 1 — THE DETAIL (4 seconds)
Extreme close-up. Hot coffee being poured into a ceramic cup. Steam
rises in slow motion. Shallow depth of field — only the stream of
liquid is sharp. Sound: quiet pour, ambient cafe murmur.
Camera: locked tripod, no movement.
SHOT 2 — THE SPACE (6 seconds)
Wide establishing shot. Empty cafe interior, morning light streaming
through large windows. Dust particles visible in light shafts. One
table set with the cup from Shot 1. Subject enters frame from right,
walks to the table. Camera: slow dolly forward, barely perceptible.
SHOT 3 — THE MOMENT (5 seconds)
Medium close-up. Subject wraps both hands around the cup. Eyes close
briefly. Micro-expression: contentment, not performance. The first sip.
Camera: handheld with gentle breathing motion. Rack focus from hands
to face.
SHOT 4 — THE WORLD (6 seconds)
Wide shot through cafe window from outside. Rain on glass. Subject
visible inside, soft and warm. Street reflections overlay her image.
Camera: locked, let the rain do the work. Slow zoom from wide to
medium over 6 seconds.
SHOT 5 — THE BRAND (4 seconds)
Close-up of the cup on the table. A hand enters frame and sets down
a saucer. The cup bears a minimal logo. Pull focus to reveal the cafe
name etched in the window behind. Hold for 2 seconds.
Camera: static. Let the composition speak.
TRANSITION NOTES:
- Shot 1→2: Match cut on circular shape (cup rim → window frame)
- Shot 2→3: Jump cut on her sitting motion
- Shot 3→4: Dissolve (interior warmth → exterior cold)
- Shot 4→5: Hard cut (statement ending)
Generate shot-by-shot in Runway, Sora, or Kling. Use consistent subject descriptions across all shots.
Tools
| Tool | Strength | Link |
|---|---|---|
| Runway Gen-3 | Motion quality, camera control | runwayml.com |
| Sora | Length, coherence, physical realism | openai.com/sora |
| Kling | Motion fidelity, fast generation | klingai.com |
| Pika | Quick iterations, style transfer | pika.art |
| Luma Dream Machine | 3D consistency, camera paths | lumalabs.ai |
| Minimax | Long-form generation | minimaxi.com |
| Veo | Google's video model, integration with Gemini | deepmind.google/veo |
Context
- Video — Video tool capabilities and comparison
- Visual Prompting — Image prompt principles that transfer to video
- Visual Art Prompts — Composition and style techniques
- Prompts — First principles across all modalities
Motion is meaning.
Questions
If you can describe any shot, does directing become writing?
- What separates a prompt that generates footage from one that generates cinema?
- When consistency across clips is solved, what's left that requires a human director?
- How does temporal direction (things changing over time) differ from spatial composition (things arranged in a frame)?