`object-detection`

HF group: Computer Vision · Status: ❌ not built

Covers: HF’s object-detection AND zero-shot-object-detection. The two share infrastructure; the difference is whether labels come from a fixed taxonomy or a runtime text query.

What it is

Image → bounding boxes + labels. Zero-shot variant takes the labels at runtime via text prompts (e.g. “find every red car”).

Open-weight models

Model	Params	Released	License	Quality	Notes
DETR / RT-DETR	~40-80 M	2020-24	Apache-2.0	Strong, fast	Transformer-based; closed label set.
YOLOv10 / YOLOv11	3-150 M	2024-25	AGPL / commercial	Real-time	Multiple sizes.
Grounding DINO 1.5	~370 M	2024	Apache-2.0	Best zero-shot detector	Open-vocabulary; takes text prompts.
OWLv2	100-300 M	2023	Apache-2.0	Strong zero-shot	Google.
Apple Vision	n/a	macOS	Apple	Object + face + barcode	Native.

Infrastructure required

Inference

❌ Object-detection runtime (encoder + detection head).
✅ Apple Vision integration would be cheapest first move (no model download).

Input

❌ Image input pipeline.
Optional text query (zero-shot variant).

Output

❌ Bounding boxes + labels + confidences.
❌ Box rendering / overlay UI in @locara/components.

Storage

❌ Weights cache.

Interaction (IPC + SDK)

❌ vision.detect({ image }) and vision.detect_zero_shot({ image, queries }) IPC.

Capabilities (manifest)

capabilities.fs.user-selected or device.camera.
capabilities.models[] for the detector.

Gaps

Image input pipeline.
Box overlay component.
Apple Vision integration would be cheapest first move (no model download).

object-detection