Locara

text-to-text-thinking

HF group: Multimodal · Status: 🟡 partial

What it is

Same shape as text-to-text but the model emits explicit reasoning tokens before the final answer. Apps can display the reasoning trace separately or hide it. Distinct from plain chat because the reasoning trace is structurally different content that the UI may want to render with different styling.

Open-weight models

ModelParamsReleasedLicenseQualityNotes
Qwen3-Thinking-1.7B1.7 B2026-04Apache-2.0Decent on mathThinking trace ~3-5x output length.
QwQ-32B32 B2025-11Apache-2.0Matches DeepSeek-R1 on math/codingFirst open reasoning model that’s actually competitive.
DeepSeek-R1-Distill-Qwen3-8B8 B2025-05MITBeats Gemini 2.5 Flash on AIMEDistilled from R1-671B. ~5 GB Q4.
Phi-4-Reasoning-Plus14 B2026MIT75 % AIME at 8 GBBest small reasoning model under 16 GB.

Infrastructure required

Inference

  • ✅ Same locara-llama path as text-to-text — reasoning models are just LLMs with different fine-tunes.

Input

  • Plain text strings (same as text-to-text).

Output

  • Stream-splitter — reasoning models emit <think> / </think> style delimiters; the SDK must split the stream so apps can show “thinking…” UI vs. final answer separately.
  • 🟡 Currently apps see raw output including delimiters.

Storage

  • ✅ Weights via locara-models::Cache.
  • App-side: optionally persist reasoning traces alongside final answers so users can audit / replay.

Interaction (IPC + SDK)

  • 🟡 Today: llm.chat_stream (same as text-to-text). Apps parse delimiters themselves.
  • ❌ Future: a dedicated llm.chat_thinking or an option flag ({ thinking: true }) on the existing call that returns events tagged thinking vs answer.

Capabilities (manifest)

  • capabilities.models[] includes a reasoning-tuned model.
  • ❌ A picker UI hint: “this model thinks before answering, expect ~5 s pause on first reply.” Manifest field for this not yet defined.

Gaps

  • Stream-splitting logic in packages/sdk/src/llm.ts.
  • Spec entry in 04-modalities.md exists but expansion is identical to plain text-to-text — needs a real expansion that wires the splitter.
  • Picker UI hint for thinking models.

See also