text-to-music
HF group: Audio · Status: ❌ not built
What it is
Text → music. Adjacent to text-to-audio
but with a different model class.
Open-weight models
| Model | Params | Released | License | Quality | Notes |
|---|---|---|---|---|---|
| MusicGen-medium | 1.5 B | 2023 | CC-BY-NC | Good for short clips | Stem control via melody-guide variant. |
| MusicGen-large | 3.3 B | 2023 | CC-BY-NC | Best of MusicGen family | Heavier. |
| Stable Audio Open 1.0 | ~1 B | 2024-07 | Stability community | Comparable to MusicGen on instrumental | Same model as sound effects. |
| ACE-Step v1.5 | ~3 B | 2025 | Apache-2.0 | Studio-grade, runs on Mac/AMD/Intel/CUDA | Best truly-permissive pick. |
| YuE | ~7 B | 2025 | Apache-2.0 | Long-form | Multi-minute coherent songs. |
| Magenta-RT | ~800 M | 2025 | Apache-2.0 | Real-time, live | Google’s open-weights live-music model. |
Infrastructure required
Inference
- ❌ Same audio-generation runtime as
text-to-audio. - ❌ Real-time mode for Magenta-RT (streaming output during generation).
Input
- Plain text prompt; optionally a melody seed (MusicGen melody-guide variant).
Output
- ❌ Streaming audio Channel for real-time variants.
- ❌ Final audio file save.
Storage
- ❌ Weights cache.
- Output:
fs.user-folder.
Interaction (IPC + SDK)
- ❌
audio.generate({ prompt, type: 'music' })IPC. Same shape as text-to-audio with a type flag.
Capabilities (manifest)
capabilities.fs.user-folder.capabilities.models[].
Gaps
Same stack as text-to-audio. ACE-Step
v1.5 is the cleanest first pick (Apache-2.0, runs on Mac).
See also
text-to-audioaudio-to-embedding— for music search- Index:
../modalities-and-models-survey.md