Locara

image-classification

HF group: Computer Vision · Status: ❌ not built

What it is

Image → label (or label distribution) from a fixed set. Distinct from zero-shot-image-classification which takes labels at runtime.

Open-weight models

ModelParamsReleasedLicenseQualityNotes
ConvNeXt-V2 (Large)~200 M2023MITTop of pure-vision classifiersFast inference.
ViT-Large~300 M2020Apache-2.0FoundationalMany fine-tunes.
DINOv2 (linear probe)86 M – 1.1 B2023Apache-2.0Best features for tiny classifier headSee image-feature-extraction.
Apple Vision (VNClassifyImageRequest)n/amacOSAppleBuilt-in scene/object tagsNative API.

Infrastructure required

Inference

  • ❌ Encoder-only inference path for vision models.
  • ✅ Apple Vision API would be cheapest hook (zero-RAM cost) — same Swift-sidecar pattern as locara-vision-ocr.

Input

  • ❌ Image input pipeline.

Output

  • Label + confidence scores; small JSON.

Storage

  • ❌ Weights cache (for ML models; native API has none).
  • App-side: classification results in locara-storage.

Interaction (IPC + SDK)

  • vision.classify({ image }) IPC.

Capabilities (manifest)

  • capabilities.fs.user-selected or device.camera.
  • capabilities.models[] for the model (or none for Apple Vision).

Gaps

Apple Vision API would be cheapest hook (zero-RAM cost, no model download) — same shape as the existing locara-vision-ocr crate.

See also