Locara

audio-classification

HF group: Audio · Status: ❌ not built

What it is

Sound → label. Covers genre detection, emotion classification, acoustic scene classification, bird species ID, alarm detection, etc.

Open-weight models

ModelParamsReleasedLicenseQualityNotes
BEATs~90 MMITMITSelf-supervised; solid baselineMicrosoft.
AST (Audio Spectrogram Transformer)~88 M2021BSD-3Fast on CPUFoundational.
YAMNet (TF Hub port)~5 M2020Apache-2.0521 AudioSet classesVery lightweight.
Apple Vision (SoundAnalysis)n/amacOSAppleBuilt-in scene/sound tagsNative API.

Infrastructure required

Inference

  • ❌ Audio encoder runtime (encoder-only-for-non-text rail).
  • ✅ Apple SoundAnalysis would be cheapest first hook — same Swift-sidecar pattern as locara-vision-ocr.

Input

  • ✅ Audio capture / file load.

Output

  • Label + confidence; small JSON.

Storage

  • ❌ Weights cache (or none for native API).
  • App-side: classification results in locara-storage.

Interaction (IPC + SDK)

  • audio.classify({ audio }) IPC.

Capabilities (manifest)

  • capabilities.device.microphone or fs.user-selected.
  • capabilities.models[] (or none for Apple).

Gaps

Audio encoder runtime. Apple SoundAnalysis would be cheapest first hook — same pattern as locara-vision-ocr.

See also