text-to-image
HF group: Computer Vision · Status: ❌ not built
What it is
Text → image. The classic diffusion task.
Open-weight models
| Model | Params | Released | License | Quality | Notes |
|---|---|---|---|---|---|
| FLUX.1 [schnell] | 12 B | 2024-08 | Apache-2.0 | Best open-weight image quality | Distilled to 4 steps; fast. |
| FLUX.1 [dev] | 12 B | 2024-08 | FLUX-1-dev (non-comm.) | Top quality | Non-commercial only. |
| FLUX.1 Kontext [dev] | 12 B | 2025 | FLUX-1-dev | Image editing | See image-text-to-image. |
| Stable Diffusion 3.5 Medium | 2.5 B | 2024-10 | Stability community | Strong, much smaller than FLUX | Best fit for edge devices / 16 GB Macs. |
| Stable Diffusion 3.5 Large | 8 B | 2024-10 | Stability community | Competitive with FLUX | Heavier. |
| Stable Diffusion XL Lightning | 3.5 B | 2024 | OpenRAIL-M | Fast (4-step) | Workhorse for quick generations. |
Infrastructure required
Inference
- ❌ Diffusion runtime (typically Diffusers / mlx-diffusion / Candle diffusers). Cleanest path:
mlx-diffusion→ SD 3.5 Medium for the default. Newlocara-diffusioncrate. - ❌ Quantization path to fit 12 B FLUX on a 24 GB Mac.
Input
- Plain text prompt (optionally with negative prompt).
Output
- ❌ Image bytes streamed back during sampling (progressive preview — each diffusion step’s latent decoded for live update).
- Final image saved to disk.
Storage
- ❌ Weights via
locara-models::Cache(large — 12 B FLUX is several GB even quantized). - Output to
fs.user-folderfor save.
Interaction (IPC + SDK)
- ❌
image.generate({ prompt, options })IPC with progress events (TauriChannel<DiffusionStep>).
Capabilities (manifest)
capabilities.fs.user-folderwrite for save location.capabilities.models[]for the diffusion model.
Gaps
Whole stack. Cleanest path: mlx-diffusion (Apple’s MLX port of
Diffusers) → small Stable Diffusion 3.5 Medium for the default.
New crate locara-diffusion, new IPC commands, picker UI.
This unlocks at least 6 other modalities — see “Cross-cutting
infrastructure” in ../modalities-and-models-survey.md.
See also
image-text-to-imageimage-to-imagetext-to-video— same runtime- Index:
../modalities-and-models-survey.md