video-to-video
HF group: Computer Vision · Status: ❌ not built
What it is
Video → transformed video (style transfer, frame interpolation, super-resolution, slow-mo).
Open-weight models
| Model | Params | Released | License | Quality | Notes |
|---|---|---|---|---|---|
| RIFE / FILM | 5-50 M | 2022-23 | Apache-2.0 | Frame interpolation | Real-time. |
| Real-ESRGAN-Anime / -Video | 17 M | 2021 | BSD-3 | Video upscale | Standard tool. |
| AnimateDiff | ~1 B | 2024 | Apache-2.0 | Style transfer over animation | Diffusion-based. |
Infrastructure required
Inference
- ❌ Frame-by-frame inference for non-diffusion variants (RIFE, ESRGAN-Video).
- ❌ Diffusion runtime for AnimateDiff.
Input
- ❌ Video input pipeline (decode frames, demux audio).
Output
- ❌ Video file save (re-encoded).
- Long-running task with progress.
Storage
- ❌ Weights cache.
- Output:
fs.user-folder.
Interaction (IPC + SDK)
- ❌
video.transform({ path, op })IPC with progress.
Capabilities (manifest)
capabilities.fs.user-selected,capabilities.fs.user-folder.capabilities.models[].
Gaps
All of the same shared rails as other video modalities — video I/O pipeline, long-running task IPC, optional diffusion runtime.
See also
text-to-videoimage-to-image— frame-by-frame variants share infra- Index:
../modalities-and-models-survey.md