Locara

text-ranking (cross-encoder reranker)

HF group: NLP · Status: ❌ not built · Tier 1 (high leverage)

What it is

Take a query + a list of candidate documents, re-score each pair jointly. Cross-encoders beat bi-encoders on top-K re-ranking quality and are the standard “second stage” in RAG pipelines:

  1. First stage — bi-encoder retrieves top-N (cheap, see text-to-embedding)
  2. Second stage — cross-encoder re-scores top-N to top-K (expensive but accurate).

Open-weight models

ModelParamsReleasedLicenseQualityNotes
BGE-Reranker-Base280 M2024MITSolid baselineSelf-hostable.
BGE-Reranker-V2-M3568 M2024MITLightweight, multilingualGood practical baseline.
BGE-Reranker-V2-Gemma2 B2024GemmaStrongLLM-distilled reranker.
ColBERT-v2110 M2022Apache-2.0Late interactionHigher infra complexity than cross-encoders.
ZeroEntropy zerankvarious2025MITTop open in their benchNewer entrant.
mxbai-rerank-large-v1335 M2024Apache-2.0StrongCompetitive with Cohere Rerank.

Infrastructure required

Inference

  • ❌ Cross-encoder mode in locara-llama (llama.cpp supports reranker models). Same shape as embedding inference but different output (single score vs vector).

Input

  • Query string + list of candidate documents.

Output

  • Sorted list with scores.

Storage

  • ❌ Weights cache.
  • App-side: typically reranker is invoked at search time — no per-call persistence.

Interaction (IPC + SDK)

  • rerank.score({ query, candidates }) IPC.
  • App pattern: vector search retrieves top-N, reranker re-orders to top-K.

Capabilities (manifest)

  • capabilities.models[] for the reranker.

Gaps

A reranker is the retrieval-quality multiplier for any RAG- style app. DocVault would benefit directly. Probably fits in locara-llama (llama.cpp supports reranker models) or a small new crate. Tier 1 BACKLOG.

See also