depth-estimation
HF group: Computer Vision · Status: ❌ not built · Tier 2 (high leverage)
What it is
Image → per-pixel depth map. Useful for AR effects, photo relighting, 3D reconstruction stage, focus / blur effects.
Open-weight models
| Model | Params | Released | License | Quality | Notes |
|---|---|---|---|---|---|
| Depth-Anything-V2 (Small) | 25 M | 2024-06 | Apache-2.0 | Best lightweight | Trained on 595 K synthetic + 62 M real. |
| Depth-Anything-V2 (Large) | 1.3 B | 2024-06 | Apache-2.0 | SOTA monocular depth | 10× faster than SD-based methods. |
| ZoeDepth | ~340 M | 2023 | MIT | Metric depth | Older; now superseded. |
| MiDaS-v3 | 110-340 M | 2022 | MIT | Foundational | Older. |
Infrastructure required
Inference
- ❌ Encoder-only inference for depth model. Depth-Anything-V2-Small (25 M) fits ONNX cleanly.
Input
- ❌ Image input pipeline.
Output
- ❌ Depth map as float-array image.
- ❌ Visualization (colormap → PNG, or save raw EXR).
Storage
- ❌ Weights cache (tiny — 25 M).
Interaction (IPC + SDK)
- ❌
vision.depth({ image })IPC.
Capabilities (manifest)
capabilities.fs.user-selectedordevice.camera.capabilities.models[].
Gaps
- Image input pipeline.
- Encoder-only inference for non-text models.
Depth-Anything-V2 small is tiny (25 M params) and Apache-2.0 — ideal first-deliverable in the CV space.