Locara

depth-estimation

HF group: Computer Vision · Status: ❌ not built · Tier 2 (high leverage)

What it is

Image → per-pixel depth map. Useful for AR effects, photo relighting, 3D reconstruction stage, focus / blur effects.

Open-weight models

ModelParamsReleasedLicenseQualityNotes
Depth-Anything-V2 (Small)25 M2024-06Apache-2.0Best lightweightTrained on 595 K synthetic + 62 M real.
Depth-Anything-V2 (Large)1.3 B2024-06Apache-2.0SOTA monocular depth10× faster than SD-based methods.
ZoeDepth~340 M2023MITMetric depthOlder; now superseded.
MiDaS-v3110-340 M2022MITFoundationalOlder.

Infrastructure required

Inference

  • ❌ Encoder-only inference for depth model. Depth-Anything-V2-Small (25 M) fits ONNX cleanly.

Input

  • ❌ Image input pipeline.

Output

  • ❌ Depth map as float-array image.
  • ❌ Visualization (colormap → PNG, or save raw EXR).

Storage

  • ❌ Weights cache (tiny — 25 M).

Interaction (IPC + SDK)

  • vision.depth({ image }) IPC.

Capabilities (manifest)

  • capabilities.fs.user-selected or device.camera.
  • capabilities.models[].

Gaps

  • Image input pipeline.
  • Encoder-only inference for non-text models.

Depth-Anything-V2 small is tiny (25 M params) and Apache-2.0 — ideal first-deliverable in the CV space.

See also