Locara

text-to-music

HF group: Audio · Status: ❌ not built

What it is

Text → music. Adjacent to text-to-audio but with a different model class.

Open-weight models

ModelParamsReleasedLicenseQualityNotes
MusicGen-medium1.5 B2023CC-BY-NCGood for short clipsStem control via melody-guide variant.
MusicGen-large3.3 B2023CC-BY-NCBest of MusicGen familyHeavier.
Stable Audio Open 1.0~1 B2024-07Stability communityComparable to MusicGen on instrumentalSame model as sound effects.
ACE-Step v1.5~3 B2025Apache-2.0Studio-grade, runs on Mac/AMD/Intel/CUDABest truly-permissive pick.
YuE~7 B2025Apache-2.0Long-formMulti-minute coherent songs.
Magenta-RT~800 M2025Apache-2.0Real-time, liveGoogle’s open-weights live-music model.

Infrastructure required

Inference

  • ❌ Same audio-generation runtime as text-to-audio.
  • ❌ Real-time mode for Magenta-RT (streaming output during generation).

Input

  • Plain text prompt; optionally a melody seed (MusicGen melody-guide variant).

Output

  • ❌ Streaming audio Channel for real-time variants.
  • ❌ Final audio file save.

Storage

  • ❌ Weights cache.
  • Output: fs.user-folder.

Interaction (IPC + SDK)

  • audio.generate({ prompt, type: 'music' }) IPC. Same shape as text-to-audio with a type flag.

Capabilities (manifest)

  • capabilities.fs.user-folder.
  • capabilities.models[].

Gaps

Same stack as text-to-audio. ACE-Step v1.5 is the cleanest first pick (Apache-2.0, runs on Mac).

See also