`translation`

HF group: NLP · Status: 🟡 partial (works through chat LLM)

What it is

Text in language A → text in language B. Specialized translation models meaningfully outperform chat LLMs on low-resource languages.

Open-weight models

Model	Params	Released	License	Quality	Notes
MADLAD-400 (3B / 7B / 10B)	3-10 B	2023	Apache-2.0	419 languages	Apache-2.0 — best for commercial.
NLLB-200	600 M – 54 B	2022	CC-BY-NC	200 languages	Non-commercial license.
OPUS-MT (per-pair)	~80 M	2020+	Apache-2.0	Solid for major pairs	Many small per-pair models.
Meta Omnilingual MT (OMT)	varies	2026-03	TBD	1,600 languages	Newest; license details still being clarified.
Any chat LLM	varies	various	various	Strong on major pairs, weak on rare	Convenient but lossy on low-resource.

Infrastructure required

Inference

🟡 Works today via chat LLM in locara-llama.
❌ Encoder-decoder runtime (BART/MBART class) for specialist models. llama.cpp doesn’t support these natively; would need Candle or ONNX Runtime path.

Input

Plain text + source/target language codes.

Output

Plain text.

Storage

✅ Weights cache.
App-side: optionally cache translations.

Interaction (IPC + SDK)

🟡 Today: llm.chat with a “translate to X” prompt.
❌ Specialized: translate.text({ text, from, to }) IPC.

Capabilities (manifest)

capabilities.models[] for the translator (or chat LLM fallback).

Gaps

Apache-2.0 specialized model (MADLAD-400) for quality on low-resource languages.
Encoder-decoder runtime — shared need with summarization specialists and Donut for DocVQA.

See also

text-to-text — fallback today
summarization — same encoder-decoder need
Index: ../modalities-and-models-survey.md