Locara

audio-to-audio

HF group: Audio · Status: ❌ not built

What it is

Audio → audio. Covers:

  • Denoising — clean speech from noisy recording.
  • Voice conversion — make speaker A sound like speaker B.
  • Source separation — extract vocals from a mix; isolate instruments.
  • Super-resolution — upsample low-rate audio.

Open-weight models

ModelParamsReleasedLicenseQualityNotes
DEMUCS v4~80 M2023MITBest music source separationStems from mix.
Open-Unmix8 M2019MITLightweight music sepOlder.
Deep Filter Net 32 M2023MITReal-time speech denoisingEdge-friendly.
RVC (Retrieval-based VC)100-200 M2023MITVoice conversionMany community variants.
kNN-VC~100 M2023MITVoice conversionHigh quality.

Infrastructure required

Inference

  • ❌ Audio encoder-decoder runtime (varies per model: ONNX / Candle).
  • Some are tiny (Deep Filter Net 3 = 2 M) — fits everywhere.

Input

Output

  • ❌ Audio file save OR streaming back through the audio playback queue (for real-time denoising).

Storage

  • ❌ Weights cache.
  • Output: fs.user-folder (file ops) or in-memory (real-time pre-processor).

Interaction (IPC + SDK)

  • audio.transform({ input, op }) where op selects denoise / sep / VC.

Capabilities (manifest)

  • capabilities.device.microphone (live) or fs.user-selected (file).
  • capabilities.fs.user-folder for save.
  • capabilities.models[].

Gaps

Whole stack. Most useful first deliverable: Deep Filter Net 3 (2 M params!) for real-time speech denoising — could plug into the existing voice pipeline as a pre-processor.

See also