`text-ranking` (cross-encoder reranker)

HF group: NLP · Status: ❌ not built · Tier 1 (high leverage)

What it is

Take a query + a list of candidate documents, re-score each pair jointly. Cross-encoders beat bi-encoders on top-K re-ranking quality and are the standard “second stage” in RAG pipelines:

First stage — bi-encoder retrieves top-N (cheap, see text-to-embedding)
Second stage — cross-encoder re-scores top-N to top-K (expensive but accurate).

Open-weight models

Model	Params	Released	License	Quality	Notes
BGE-Reranker-Base	280 M	2024	MIT	Solid baseline	Self-hostable.
BGE-Reranker-V2-M3	568 M	2024	MIT	Lightweight, multilingual	Good practical baseline.
BGE-Reranker-V2-Gemma	2 B	2024	Gemma	Strong	LLM-distilled reranker.
ColBERT-v2	110 M	2022	Apache-2.0	Late interaction	Higher infra complexity than cross-encoders.
ZeroEntropy zerank	various	2025	MIT	Top open in their bench	Newer entrant.
mxbai-rerank-large-v1	335 M	2024	Apache-2.0	Strong	Competitive with Cohere Rerank.

Infrastructure required

Inference

❌ Cross-encoder mode in locara-llama (llama.cpp supports reranker models). Same shape as embedding inference but different output (single score vs vector).

Input

Query string + list of candidate documents.

Output

Sorted list with scores.

Storage

❌ Weights cache.
App-side: typically reranker is invoked at search time — no per-call persistence.

Interaction (IPC + SDK)

❌ rerank.score({ query, candidates }) IPC.
App pattern: vector search retrieves top-N, reranker re-orders to top-K.

Capabilities (manifest)

capabilities.models[] for the reranker.

Gaps

A reranker is the retrieval-quality multiplier for any RAG- style app. DocVault would benefit directly. Probably fits in locara-llama (llama.cpp supports reranker models) or a small new crate. Tier 1 BACKLOG.

text-ranking (cross-encoder reranker)