text-to-3d
HF group: Computer Vision · Status: ❌ not built
What it is
Text → 3D mesh / Gaussian splat. Note that most “text-to-3D”
pipelines today actually go text → image → 3D, so the heavy lift
is in image-to-3d.
Open-weight models
| Model | Params | Released | License | Quality | Notes |
|---|---|---|---|---|---|
| Hunyuan3D-2.1 | ~5 B | 2025-06 | Apache-2.0 | First production-ready open 3D | PBR-ready meshes, 6 GB VRAM minimum. Both text-to-3D and image-to-3D. |
| MeshAnything | ~350 M | 2024 | CC-BY-NC | High quality but small assets | Auto-regressive mesh generation. |
| 3DGen-Arena open models (variants) | varies | 2025 | various | Active research area | Niche; image-to-3D dominates. |
Infrastructure required
Inference
- ❌ 3D-specific diffusion / mesh-generation runtime.
Input
- Plain text prompt; optionally chained through
text-to-imagefirst.
Output
- ❌ 3D file (.glb / .obj / .ply) saved to disk.
- ❌ 3D viewer component in
@locara/components(three.js-based) for in-app preview.
Storage
- ❌ Weights cache.
- Output:
fs.user-folder.
Interaction (IPC + SDK)
- ❌
mesh.generate({ prompt })IPC.
Capabilities (manifest)
capabilities.fs.user-folderfor save.capabilities.models[]for the model.
Gaps
- 3D viewer component.
- 3D file output IPC.
- Most pipelines route through
image-to-3d, so that should land first.
See also
image-to-3dtext-to-image— first stage of typical pipeline- Index:
../modalities-and-models-survey.md