`text-to-speech`

HF group: Audio · Status: 🟡 partial (system say only)

What it is

Text → speech audio.

Open-weight models

Model	Params	Released	License	Quality	Notes
Kokoro-82M	82 M	2025-01	Apache-2.0	#1 on TTS Arena	Sub-300 ms per sentence; no native voice cloning. Best small-footprint pick.
F5-TTS	~330 M	2024	CC-BY-NC	High fidelity, voice clones from seconds	NON-COMMERCIAL license.
XTTS-v2	467 M	2024	Coqui CPML	17-language voice clones	License requires Coqui contact for commercial.
Bark	~1 B	2023	MIT	Expressive, slow	Useful for prototyping.
Piper	~30 M	2023	MIT	Fast, robotic	Edge-device friendly. Per-voice models.
Apple AVSpeech / `say`	n/a	macOS	Apple	Excellent on Apple voices	System-provided; what Locara’s voice-pipeline uses today.
Apple Speech Synthesizer (Neural Voices)	n/a	macOS 14+	Apple	Studio-quality	Bigger neural voices on macOS.

Infrastructure required

Inference

✅ macOS say / AVSpeech via crates/locara-voice-pipeline/src/say.rs (zero-RAM, system voices).
❌ Kokoro / Piper integration — would need an MLX or ONNX path.
❌ Voice-cloning TTS (F5-TTS / XTTS) — license complexity (non-commercial).

Input

Plain UTF-8 text strings, optional voice + locale.

Output

✅ Streaming audio playback. apps/voice uses sentence chunking (pumpSentenceTts) so the first sentence plays while the rest stream.
✅ Cancellation + interrupt (barge-in) supported by the pipeline backend.

Storage

For ML models: ❌ — no weights cached for TTS today (system say doesn’t need them).
✅ User’s voice-preference stored in app data via locara-storage.

Interaction (IPC + SDK)

✅ Used internally as part of voice-to-voice pipeline backend; no standalone tts.speak command yet.
❌ Standalone tts.speak IPC for non-voice apps that just want speech output.

Capabilities (manifest)

❌ capabilities.device.speaker cool-down semantics — pending.
For ML TTS: capabilities.models[] would list the TTS model.

Gaps

Kokoro / Piper integration for cross-platform / for when an app wants more control over the voice.
Voice-cloning TTS (F5-TTS / XTTS) needs a non-commercial-fork branch since licenses are messy.
Apple Neural Voices integration (Swift sidecar) on BACKLOG.
Standalone tts.speak IPC for non-voice-pipeline apps.

See also

voice-to-voice — composes TTS with STT + LLM
speech-to-text
Crates: locara-voice-pipeline
Index: ../modalities-and-models-survey.md