`text-to-text-thinking`

HF group: Multimodal · Status: 🟡 partial

What it is

Same shape as text-to-text but the model emits explicit reasoning tokens before the final answer. Apps can display the reasoning trace separately or hide it. Distinct from plain chat because the reasoning trace is structurally different content that the UI may want to render with different styling.

Open-weight models

Model	Params	Released	License	Quality	Notes
Qwen3-Thinking-1.7B	1.7 B	2026-04	Apache-2.0	Decent on math	Thinking trace ~3-5x output length.
QwQ-32B	32 B	2025-11	Apache-2.0	Matches DeepSeek-R1 on math/coding	First open reasoning model that’s actually competitive.
DeepSeek-R1-Distill-Qwen3-8B	8 B	2025-05	MIT	Beats Gemini 2.5 Flash on AIME	Distilled from R1-671B. ~5 GB Q4.
Phi-4-Reasoning-Plus	14 B	2026	MIT	75 % AIME at 8 GB	Best small reasoning model under 16 GB.

Infrastructure required

Inference

✅ Same locara-llama path as text-to-text — reasoning models are just LLMs with different fine-tunes.

Input

Plain text strings (same as text-to-text).

Output

❌ Stream-splitter — reasoning models emit <think> / </think> style delimiters; the SDK must split the stream so apps can show “thinking…” UI vs. final answer separately.
🟡 Currently apps see raw output including delimiters.

Storage

✅ Weights via locara-models::Cache.
App-side: optionally persist reasoning traces alongside final answers so users can audit / replay.

Interaction (IPC + SDK)

🟡 Today: llm.chat_stream (same as text-to-text). Apps parse delimiters themselves.
❌ Future: a dedicated llm.chat_thinking or an option flag ({ thinking: true }) on the existing call that returns events tagged thinking vs answer.

Capabilities (manifest)

✅ capabilities.models[] includes a reasoning-tuned model.
❌ A picker UI hint: “this model thinks before answering, expect ~5 s pause on first reply.” Manifest field for this not yet defined.

Gaps

Stream-splitting logic in packages/sdk/src/llm.ts.
Spec entry in 04-modalities.md exists but expansion is identical to plain text-to-text — needs a real expansion that wires the splitter.
Picker UI hint for thinking models.

text-to-text-thinking