text-to-text-thinking
HF group: Multimodal · Status: 🟡 partial
What it is
Same shape as text-to-text but the model
emits explicit reasoning tokens before the final answer. Apps can
display the reasoning trace separately or hide it. Distinct from
plain chat because the reasoning trace is structurally different
content that the UI may want to render with different styling.
Open-weight models
| Model | Params | Released | License | Quality | Notes |
|---|---|---|---|---|---|
| Qwen3-Thinking-1.7B | 1.7 B | 2026-04 | Apache-2.0 | Decent on math | Thinking trace ~3-5x output length. |
| QwQ-32B | 32 B | 2025-11 | Apache-2.0 | Matches DeepSeek-R1 on math/coding | First open reasoning model that’s actually competitive. |
| DeepSeek-R1-Distill-Qwen3-8B | 8 B | 2025-05 | MIT | Beats Gemini 2.5 Flash on AIME | Distilled from R1-671B. ~5 GB Q4. |
| Phi-4-Reasoning-Plus | 14 B | 2026 | MIT | 75 % AIME at 8 GB | Best small reasoning model under 16 GB. |
Infrastructure required
Inference
- ✅ Same
locara-llamapath astext-to-text— reasoning models are just LLMs with different fine-tunes.
Input
- Plain text strings (same as
text-to-text).
Output
- ❌ Stream-splitter — reasoning models emit
<think>/</think>style delimiters; the SDK must split the stream so apps can show “thinking…” UI vs. final answer separately. - 🟡 Currently apps see raw output including delimiters.
Storage
- ✅ Weights via
locara-models::Cache. - App-side: optionally persist reasoning traces alongside final answers so users can audit / replay.
Interaction (IPC + SDK)
- 🟡 Today:
llm.chat_stream(same astext-to-text). Apps parse delimiters themselves. - ❌ Future: a dedicated
llm.chat_thinkingor an option flag ({ thinking: true }) on the existing call that returns events taggedthinkingvsanswer.
Capabilities (manifest)
- ✅
capabilities.models[]includes a reasoning-tuned model. - ❌ A picker UI hint: “this model thinks before answering, expect ~5 s pause on first reply.” Manifest field for this not yet defined.
Gaps
- Stream-splitting logic in
packages/sdk/src/llm.ts. - Spec entry in
04-modalities.mdexists but expansion is identical to plaintext-to-text— needs a real expansion that wires the splitter. - Picker UI hint for thinking models.
See also
text-to-text- Crates:
locara-llama,locara-core::InferenceBackend. - Index:
../modalities-and-models-survey.md