text-to-embedding
HF group: NLP · Status: ✅ shipped
HF aliases: feature-extraction, sentence-similarity.
What it is
Text → fixed-size float vector for retrieval, clustering, classification, similarity. The default storage primitive for any Locara app that needs RAG-style retrieval.
Open-weight models
| Model | Params | Released | License | Quality | Notes |
|---|---|---|---|---|---|
| BGE-M3 | 568 M | 2024 | MIT | MTEB 72 % retrieval | Multilingual, dense + sparse + multi-vector in one model. Best self-host pick. |
| Nomic Embed v2 | 137 M (305 M MoE active / 475 M total) | 2025 | Apache-2.0 | Strong on multilingual | First MoE embedder. CPU-friendly. |
| E5-Large-v2 | 560 M | 2023 | MIT | Strong English | Solid baseline. |
| E5-Small | 33 M | 2023 | MIT | Punch-above-weight | 384 dims, 512 ctx — good for laptops. |
| GTE-multilingual-base | 305 M | 2024 | Apache-2.0 | Strong, fast | 10× faster than decoder-only embedders. |
| Snowflake-Arctic-Embed-L | 335 M | 2024 | Apache-2.0 | Strong English | Best open-weight English-only. |
| mxbai-embed-large-v1 | 335 M | 2024 | Apache-2.0 | Strong | Drop-in BGE alternative. |
Infrastructure required
Inference
- ✅ Encoder-only mode in
locara-llama(llama.cpp’s embedding mode). - ❌ Some models (Nomic Embed v2 MoE, BGE-M3 multi-vector) need a separate ONNX or Candle path.
Input
- Plain UTF-8 text strings (single or batch). No special capture infrastructure.
Output
- ✅
Vec<f32>per input string returned synchronously. - Output dimensionality is model-dependent and declared in the manifest so storage allocates the right schema.
Storage
- ✅ Weights via
locara-models::Cache. - ✅ Vector store:
sqlite-vecextension vialocara-storage(per-app SQLite +vec0virtual tables for ANN search). - ❌ Multi-vector storage (for ColPali-style late interaction) —
sqlite-vecsupports it via custom SQL but no clean SDK helper yet.
Interaction (IPC + SDK)
- ✅ IPC:
embed.embedacceptingstring | string[]. - ✅ SDK:
embed.embed(text)/embed.embed(texts)inpackages/sdk/src/embed.ts. - ❌ Streaming embedding for long-doc ingestion (currently best-effort batching in apps).
Capabilities (manifest)
- ✅
capabilities.models[]lists the embedding model. - App-side schema declared in
storage.schema(sql file) to allocate the right vector dimension.
Gaps
- Bigger curated model list (we ship one default).
- Async batch embedding for large corpora (currently best-effort in apps).
- Streaming embedding for long-doc ingestion.
- Multi-vector mode for ColPali-class retrieval.
See also
text-ranking— second-stage cross-encoder (use these together for RAG)audio-to-embeddingimage-feature-extractionvisual-document-retrieval— multi-vector variant- Crates:
locara-llama,locara-storage - Index:
../modalities-and-models-survey.md