Locara

object-detection

HF group: Computer Vision · Status: ❌ not built

Covers: HF’s object-detection AND zero-shot-object-detection. The two share infrastructure; the difference is whether labels come from a fixed taxonomy or a runtime text query.

What it is

Image → bounding boxes + labels. Zero-shot variant takes the labels at runtime via text prompts (e.g. “find every red car”).

Open-weight models

ModelParamsReleasedLicenseQualityNotes
DETR / RT-DETR~40-80 M2020-24Apache-2.0Strong, fastTransformer-based; closed label set.
YOLOv10 / YOLOv113-150 M2024-25AGPL / commercialReal-timeMultiple sizes.
Grounding DINO 1.5~370 M2024Apache-2.0Best zero-shot detectorOpen-vocabulary; takes text prompts.
OWLv2100-300 M2023Apache-2.0Strong zero-shotGoogle.
Apple Visionn/amacOSAppleObject + face + barcodeNative.

Infrastructure required

Inference

  • ❌ Object-detection runtime (encoder + detection head).
  • ✅ Apple Vision integration would be cheapest first move (no model download).

Input

  • ❌ Image input pipeline.
  • Optional text query (zero-shot variant).

Output

  • ❌ Bounding boxes + labels + confidences.
  • Box rendering / overlay UI in @locara/components.

Storage

  • ❌ Weights cache.

Interaction (IPC + SDK)

  • vision.detect({ image }) and vision.detect_zero_shot({ image, queries }) IPC.

Capabilities (manifest)

  • capabilities.fs.user-selected or device.camera.
  • capabilities.models[] for the detector.

Gaps

  • Image input pipeline.
  • Box overlay component.
  • Apple Vision integration would be cheapest first move (no model download).

See also