image-classification
HF group: Computer Vision · Status: ❌ not built
What it is
Image → label (or label distribution) from a fixed set. Distinct
from zero-shot-image-classification
which takes labels at runtime.
Open-weight models
| Model | Params | Released | License | Quality | Notes |
|---|---|---|---|---|---|
| ConvNeXt-V2 (Large) | ~200 M | 2023 | MIT | Top of pure-vision classifiers | Fast inference. |
| ViT-Large | ~300 M | 2020 | Apache-2.0 | Foundational | Many fine-tunes. |
| DINOv2 (linear probe) | 86 M – 1.1 B | 2023 | Apache-2.0 | Best features for tiny classifier head | See image-feature-extraction. |
Apple Vision (VNClassifyImageRequest) | n/a | macOS | Apple | Built-in scene/object tags | Native API. |
Infrastructure required
Inference
- ❌ Encoder-only inference path for vision models.
- ✅ Apple Vision API would be cheapest hook (zero-RAM cost) — same Swift-sidecar pattern as
locara-vision-ocr.
Input
- ❌ Image input pipeline.
Output
- Label + confidence scores; small JSON.
Storage
- ❌ Weights cache (for ML models; native API has none).
- App-side: classification results in
locara-storage.
Interaction (IPC + SDK)
- ❌
vision.classify({ image })IPC.
Capabilities (manifest)
capabilities.fs.user-selectedordevice.camera.capabilities.models[]for the model (or none for Apple Vision).
Gaps
Apple Vision API would be cheapest hook (zero-RAM cost, no model
download) — same shape as the existing locara-vision-ocr crate.
See also
zero-shot-image-classificationimage-feature-extraction- Crates:
locara-vision-ocr(template for Swift-sidecar pattern) - Index:
../modalities-and-models-survey.md