video-classification
HF group: Computer Vision · Status: ❌ not built
What it is
Video → action / activity label.
Open-weight models
| Model | Params | Released | License | Quality | Notes |
|---|---|---|---|---|---|
| VideoMAE-V2 | 90 M – 1 B | 2023 | CC-BY-NC | Strong on Kinetics | Self-supervised. |
| TimeSformer | ~120 M | 2021 | Apache-2.0 | Foundational | Older. |
| Apple Vision Action Classifier | n/a | macOS | Apple | Limited classes | Native. |
Infrastructure required
Inference
- ❌ Video encoder runtime.
Input
- ❌ Video file loading + frame sampling.
Output
- Label + confidence.
Storage
- ❌ Weights cache.
Interaction (IPC + SDK)
- ❌
video.classify({ path })IPC.
Capabilities (manifest)
capabilities.fs.user-selected.capabilities.models[].
Gaps
Same shared rails: video input pipeline, encoder-only inference for non-text.
See also
video-text-to-text— superset (Q&A vs label)image-classification- Index:
../modalities-and-models-survey.md