image-to-3d
HF group: Computer Vision · Status: ❌ not built
What it is
Single image (+ optional text) → 3D mesh. Most “text-to-3D” pipelines today actually go text → image → 3D, so this is the heavy-lift stage.
Open-weight models
| Model | Params | Released | License | Quality | Notes |
|---|---|---|---|---|---|
| Hunyuan3D-2.1 | ~5 B | 2025-06 | Apache-2.0 | PBR-ready meshes | Same model as text-to-3D. 6 GB VRAM. |
| TripoSR | ~1 B | 2024 | MIT | Half-second image-to-mesh | Bakes lighting into texture; static-asset only. |
| InstantMesh | ~1 B | 2024 | Apache-2.0 | 512x512 mesh | 10× faster than optimization-based methods. |
| CRM (Convolutional Reconstruction Model) | ~600 M | 2024 | Apache-2.0 | Strong on objects | Image → 6 views → mesh. |
Infrastructure required
Inference
- ❌ 3D-specific runtime.
Input
- ❌ Image input pipeline (shared with
image-text-to-text). - Optional text prompt.
Output
- ❌ 3D file save + 3D viewer component.
Storage
- ❌ Weights cache.
- Output:
fs.user-folder.
Interaction (IPC + SDK)
- ❌
mesh.from_image({ image, prompt? })IPC.
Capabilities (manifest)
capabilities.fs.user-selectedfor input image.capabilities.fs.user-folderfor save.capabilities.models[]for the model.
Gaps
- Image input pipeline.
- 3D file output + viewer component.