`text-to-image`

HF group: Computer Vision · Status: ❌ not built

What it is

Text → image. The classic diffusion task.

Open-weight models

Model	Params	Released	License	Quality	Notes
FLUX.1 [schnell]	12 B	2024-08	Apache-2.0	Best open-weight image quality	Distilled to 4 steps; fast.
FLUX.1 [dev]	12 B	2024-08	FLUX-1-dev (non-comm.)	Top quality	Non-commercial only.
FLUX.1 Kontext [dev]	12 B	2025	FLUX-1-dev	Image editing	See `image-text-to-image`.
Stable Diffusion 3.5 Medium	2.5 B	2024-10	Stability community	Strong, much smaller than FLUX	Best fit for edge devices / 16 GB Macs.
Stable Diffusion 3.5 Large	8 B	2024-10	Stability community	Competitive with FLUX	Heavier.
Stable Diffusion XL Lightning	3.5 B	2024	OpenRAIL-M	Fast (4-step)	Workhorse for quick generations.

Infrastructure required

Inference

❌ Diffusion runtime (typically Diffusers / mlx-diffusion / Candle diffusers). Cleanest path: mlx-diffusion → SD 3.5 Medium for the default. New locara-diffusion crate.
❌ Quantization path to fit 12 B FLUX on a 24 GB Mac.

Input

Plain text prompt (optionally with negative prompt).

Output

❌ Image bytes streamed back during sampling (progressive preview — each diffusion step’s latent decoded for live update).
Final image saved to disk.

Storage

❌ Weights via locara-models::Cache (large — 12 B FLUX is several GB even quantized).
Output to fs.user-folder for save.

Interaction (IPC + SDK)

❌ image.generate({ prompt, options }) IPC with progress events (Tauri Channel<DiffusionStep>).

Capabilities (manifest)

capabilities.fs.user-folder write for save location.
capabilities.models[] for the diffusion model.

Gaps

Whole stack. Cleanest path: mlx-diffusion (Apple’s MLX port of Diffusers) → small Stable Diffusion 3.5 Medium for the default. New crate locara-diffusion, new IPC commands, picker UI.

This unlocks at least 6 other modalities — see “Cross-cutting infrastructure” in ../modalities-and-models-survey.md.

text-to-image