Locara

audio-to-embedding

HF group: Audio · Status: ❌ not built

What it is

Audio → fixed-size float vector. Used for audio search, classification, similarity. The audio analog of text-to-embedding.

Open-weight models

ModelParamsReleasedLicenseQualityNotes
CLAP (LAION)~636 M2023CC0Standard text-audio joint embeddingLAION CLAP variants are most-used.
LAION-CLAP-music~636 M2023CC0Music-tunedBetter for music search.
MERT-v1-330M330 M2024Apache-2.0Music-focusedSelf-supervised on music.
Wav2Vec2-Large317 M2020Apache-2.0Speech featuresFoundational, but more for downstream tasks.

Infrastructure required

Inference

  • ❌ Audio encoder runtime (encoder-only-for-non-text rail).

Input

Output

  • Vec<f32> per audio clip.

Storage

  • ❌ Weights cache.
  • ✅ Vector store via sqlite-vec (shared with text-to-embedding).

Interaction (IPC + SDK)

  • embed.audio IPC stub reserved in spec, not implemented.

Capabilities (manifest)

  • capabilities.device.microphone or fs.user-selected.
  • capabilities.models[].

Gaps

Audio encoder runtime. locara-audio-embed crate or extension to locara-llama for CLAP-class encoder support.

See also