Locara

text-to-speech

HF group: Audio · Status: 🟡 partial (system say only)

What it is

Text → speech audio.

Open-weight models

ModelParamsReleasedLicenseQualityNotes
Kokoro-82M82 M2025-01Apache-2.0#1 on TTS ArenaSub-300 ms per sentence; no native voice cloning. Best small-footprint pick.
F5-TTS~330 M2024CC-BY-NCHigh fidelity, voice clones from secondsNON-COMMERCIAL license.
XTTS-v2467 M2024Coqui CPML17-language voice clonesLicense requires Coqui contact for commercial.
Bark~1 B2023MITExpressive, slowUseful for prototyping.
Piper~30 M2023MITFast, roboticEdge-device friendly. Per-voice models.
Apple AVSpeech / sayn/amacOSAppleExcellent on Apple voicesSystem-provided; what Locara’s voice-pipeline uses today.
Apple Speech Synthesizer (Neural Voices)n/amacOS 14+AppleStudio-qualityBigger neural voices on macOS.

Infrastructure required

Inference

  • ✅ macOS say / AVSpeech via crates/locara-voice-pipeline/src/say.rs (zero-RAM, system voices).
  • ❌ Kokoro / Piper integration — would need an MLX or ONNX path.
  • ❌ Voice-cloning TTS (F5-TTS / XTTS) — license complexity (non-commercial).

Input

  • Plain UTF-8 text strings, optional voice + locale.

Output

  • ✅ Streaming audio playback. apps/voice uses sentence chunking (pumpSentenceTts) so the first sentence plays while the rest stream.
  • ✅ Cancellation + interrupt (barge-in) supported by the pipeline backend.

Storage

  • For ML models: ❌ — no weights cached for TTS today (system say doesn’t need them).
  • ✅ User’s voice-preference stored in app data via locara-storage.

Interaction (IPC + SDK)

  • ✅ Used internally as part of voice-to-voice pipeline backend; no standalone tts.speak command yet.
  • ❌ Standalone tts.speak IPC for non-voice apps that just want speech output.

Capabilities (manifest)

  • capabilities.device.speaker cool-down semantics — pending.
  • For ML TTS: capabilities.models[] would list the TTS model.

Gaps

  • Kokoro / Piper integration for cross-platform / for when an app wants more control over the voice.
  • Voice-cloning TTS (F5-TTS / XTTS) needs a non-commercial-fork branch since licenses are messy.
  • Apple Neural Voices integration (Swift sidecar) on BACKLOG.
  • Standalone tts.speak IPC for non-voice-pipeline apps.

See also