image-to-video
HF group: Computer Vision · Status: ❌ not built
What it is
Still image → animated video, no text instruction. Useful for
animating photos / artwork. Distinct from
image-text-to-video which takes a
prompt.
Open-weight models
| Model | Params | Released | License | Quality | Notes |
|---|---|---|---|---|---|
| Stable Video Diffusion (SVD-XT) | ~2 B | 2023 | Stability community | 25 frames at 576x1024 | Foundational; non-commercial license caveats. |
| CogVideoX-5B-I2V | 5 B | 2024 | Apache-2.0 | Strong, controllable | Same model as image-text-to-video; can run without text prompt. |
| DynamiCrafter | ~1 B | 2024 | Apache-2.0 | Subtle natural motion | Hair, water, etc. |
| I2VGen-XL | ~2 B | 2024 | Apache-2.0 | Photo → motion | Alibaba DAMO. |
Infrastructure required
Inference
- ❌ Video diffusion runtime (shared with
text-to-video).
Input
- ❌ Image input pipeline.
Output
- ❌ Video file save (10-100 MB).
Storage
- ❌ Weights cache.
- Output:
fs.user-folder.
Interaction (IPC + SDK)
- ❌
video.animate({ image })IPC with long-running task progress.
Capabilities (manifest)
capabilities.fs.user-selectedfor input image.capabilities.fs.user-folderfor output.capabilities.models[]for the model.
Gaps
Diffusion runtime, image input pipeline, video output IPC, long-running task progress IPC.
See also
text-to-videoimage-text-to-videotext-to-image— same runtime- Index:
../modalities-and-models-survey.md