Will Locara run on your Mac?
Pick a chip. See what fits.
Locara is Apple-Silicon-only. Every model and every app has a
memory floor that depends on the chip, the bandwidth, and
the RAM tier you bought. The tables below are computed at
build time from each app's locara.json manifest
and a curated registry of every M-series Mac since 2020 —
no guessing, no marketing numbers.
One formula. Three inputs. Reproducible from the manifest.
For every app, we add up the resident memory cost of all
its pinned models, plus a 1 GB activation + KV-cache
overhead, and compare against 70% of the Mac's
RAM — leaving 30% for macOS and the user's other
apps. The author-declared profiles.mid.min_ram_gb
becomes a hard floor. Mac variants and bandwidth come from
the curated lineup table.
- Model size
params × bpw / 8— Q4_K_M ≈ 4.89 bpw. Derivation → - Fit rule
working_set ≤ 0.7 × ram_gb. - Bandwidth sets tok/s, not fit. See per-SKU table →
Every M-series Mac — RAM, bandwidth, fit.
Bandwidth determines decode speed (tok/s); RAM determines what fits. The "largest fit" column shows the biggest standard dense Q4 model that fits at 8K context — bigger models become possible at lower context or with KV-cache quantization.
| Family | Chip | Year | RAM | Bandwidth | Bus | Cooling | Largest fit (Q4 dense, 8K ctx) |
|---|---|---|---|---|---|---|---|
| MacBook Air | M1 | 2020 | 8 GB16 GB | 68 GB/s | 128-bit | passive | 8 GB 7B Q4 16 GB 14B Q4 |
| MacBook Pro 13" | M1 | 2020 | 8 GB16 GB | 68 GB/s | 128-bit | active | 8 GB 7B Q4 16 GB 14B Q4 |
| Mac mini | M1 | 2020 | 8 GB16 GB | 68 GB/s | 128-bit | active | 8 GB 7B Q4 16 GB 14B Q4 |
| MacBook Pro 14"/16" | M1 Pro | 2021 | 16 GB32 GB | 200 GB/s | 256-bit | active | 16 GB 14B Q4 32 GB 32B Q4 |
| MacBook Pro 14"/16" | M1 Max | 2021 | 32 GB64 GB | 400 GB/s | 512-bit | active | 32 GB 32B Q4 64 GB 70B Q4 |
| Mac Studio | M1 Ultra | 2022 | 64 GB128 GB | 800 GB/s | 1024-bit | active | 64 GB 70B Q4 128 GB Mixtral 8x22B Q4 |
| MacBook Air 13"/15" | M2 | 2022 | 8 GB16 GB24 GB | 100 GB/s | 128-bit | passive | 8 GB 7B Q4 16 GB 14B Q4 24 GB 14B Q4 |
| Mac mini | M2 | 2023 | 8 GB16 GB24 GB | 100 GB/s | 128-bit | active | 8 GB 7B Q4 16 GB 14B Q4 24 GB 14B Q4 |
| MacBook Pro 14"/16" | M2 Pro | 2023 | 16 GB32 GB | 200 GB/s | 256-bit | active | 16 GB 14B Q4 32 GB 32B Q4 |
| MacBook Pro 14"/16" | M2 Max | 2023 | 32 GB64 GB96 GB | 400 GB/s | 512-bit | active | 32 GB 32B Q4 64 GB 70B Q4 96 GB 70B Q4 |
| Mac Studio | M2 Max | 2023 | 32 GB64 GB96 GB | 400 GB/s | 512-bit | active | 32 GB 32B Q4 64 GB 70B Q4 96 GB 70B Q4 |
| Mac Studio | M2 Ultra | 2023 | 64 GB128 GB192 GB | 800 GB/s | 1024-bit | active | 64 GB 70B Q4 128 GB Mixtral 8x22B Q4 192 GB Mixtral 8x22B Q4 |
| MacBook Air 13"/15" | M3 | 2024 | 8 GB16 GB24 GB | 100 GB/s | 128-bit | passive | 8 GB 7B Q4 16 GB 14B Q4 24 GB 14B Q4 |
| MacBook Pro 14" | M3 | 2023 | 8 GB16 GB24 GB | 100 GB/s | 128-bit | active | 8 GB 7B Q4 16 GB 14B Q4 24 GB 14B Q4 |
| MacBook Pro 14"/16" | M3 Pro narrower bus than M2 Pro — slower decode for bandwidth-bound models | 2023 | 18 GB36 GB | 150 GB/s | 192-bit | active | 18 GB 14B Q4 36 GB 32B Q4 |
| MacBook Pro 14"/16" | M3 Max (14-core) binned die — 384-bit bus | 2023 | 36 GB96 GB | 300 GB/s | 384-bit | active | 36 GB 32B Q4 96 GB 70B Q4 |
| MacBook Pro 14"/16" | M3 Max (16-core) full die — 512-bit bus | 2023 | 48 GB64 GB128 GB | 400 GB/s | 512-bit | active | 48 GB 32B Q4 64 GB 70B Q4 128 GB Mixtral 8x22B Q4 |
| Mac Studio | M3 Ultra 512 GB option is the consumer-hardware capacity ceiling | 2025 | 96 GB256 GB512 GB | 800 GB/s | 1024-bit | active | 96 GB 70B Q4 256 GB Mixtral 8x22B Q4 512 GB DeepSeek-V3 Q4 |
| MacBook Air 13"/15" | M4 | 2025 | 16 GB24 GB32 GB | 120 GB/s | 128-bit | passive | 16 GB 14B Q4 24 GB 14B Q4 32 GB 32B Q4 |
| Mac mini | M4 | 2024 | 16 GB24 GB32 GB | 120 GB/s | 128-bit | active | 16 GB 14B Q4 24 GB 14B Q4 32 GB 32B Q4 |
| MacBook Pro 14" | M4 | 2024 | 16 GB24 GB32 GB | 120 GB/s | 128-bit | active | 16 GB 14B Q4 24 GB 14B Q4 32 GB 32B Q4 |
| MacBook Pro 14"/16" | M4 Pro | 2024 | 24 GB48 GB64 GB | 273 GB/s | 256-bit | active | 24 GB 14B Q4 48 GB 32B Q4 64 GB 70B Q4 |
| MacBook Pro 14"/16" | M4 Max (14-core) binned die — 36 GB only | 2024 | 36 GB | 410 GB/s | 384-bit | active | 36 GB 32B Q4 |
| MacBook Pro 14"/16" | M4 Max (16-core) biggest generational bandwidth jump in M-series | 2024 | 48 GB64 GB128 GB | 546 GB/s | 512-bit | active | 48 GB 32B Q4 64 GB 70B Q4 128 GB Mixtral 8x22B Q4 |
| Mac Studio | M4 Max | 2025 | 36 GB48 GB64 GB128 GB | 546 GB/s | 512-bit | active | 36 GB 32B Q4 48 GB 32B Q4 64 GB 70B Q4 128 GB Mixtral 8x22B Q4 |
| MacBook Pro 14" | M5 first Apple Silicon with neural accelerators inside each GPU core | 2025 | 16 GB24 GB32 GB | 150 GB/s | 128-bit | active | 16 GB 14B Q4 24 GB 14B Q4 32 GB 32B Q4 |
Late-Intel Macs (pre-2020) are not supported — no unified memory, no MLX, no Metal-shared-storage path. See the full lineage and the M3 Pro bandwidth regression notes in the Mac hardware lineup →.
Locara apps, by Mac RAM tier.
Each row is an app from the catalogue, sorted by RAM floor. "Working set" is the sum of all pinned models plus the 1 GB activation overhead. The declared column is what the developer asserts in the manifest; the computed column is what the math says — we use the larger of the two as the effective floor.
| App | Modalities | Working set | Declared min | Computed min | Runs on |
|---|---|---|---|---|---|
| Data Analyser kingtongchoo | text-to-texttext-to-code | 1.92 GB | 16 GB | 8 GB | 1618243236486496128 + |
| demo your-publisher-id | text-to-text | 1.31 GB | 16 GB | 8 GB | 1618243236486496128 + |
| DocVault kingtongchoo | text-to-texttext-to-embeddingimage-to-text | 1.44 GB | 16 GB | 8 GB | 1618243236486496128 + |
| Listen kingtongchoo | text-to-textspeech-to-text | 1.46 GB | 16 GB | 8 GB | 1618243236486496128 + |
| Reader kingtongchoo | text-to-texttext-to-embedding | 4.67 GB | 16 GB | 8 GB | 1618243236486496128 + |
| Scribe kingtongchoo | text-to-texttext-to-code | 2.83 GB | 16 GB | 8 GB | 1618243236486496128 + |
| Studio kingtongchoo | text-to-texttext-to-code | 1.92 GB | 16 GB | 8 GB | 1618243236486496128 + |
| Transcribe kingtongchoo | text-to-texttext-to-embeddingspeech-to-text | 1.46 GB | 16 GB | 8 GB | 1618243236486496128 + |
| Video Generator kingtongchoo | text-to-texttext-to-code | 1.92 GB | 16 GB | 8 GB | 1618243236486496128 + |
| Voice kingtongchoo | text-to-textspeech-to-textvoice-to-voice | 14.91 GB | 16 GB | 24 GB | 243236486496128 + |
Today every shipping app fits on a 16 GB Mac because they pin small (≤3B Q4) models. As the model registry grows to 7B and 13B classes for chat, and 7B+ for voice-omni, this matrix becomes the differentiator between "runs on any Air" and "needs a Pro." App authors should treat the declared min as a contract with the user, not an estimate.
What's pinned, by whom, at what cost.
Models in Locara are content-addressed and shared across apps. Every model below is referenced by at least one shipping app's manifest. Weight cost is approximate; see llm-memory-math → for the underlying formula.
| Model | Modality | Params | Quant (bpw) | Weight | Used by |
|---|---|---|---|---|---|
BGE Small en v1.5 bge-small-en-v1.5 | embed | 0.033B | 16.00 | 0.13 GB | |
Whisper base.en whisper-base.en | stt | 0.074B | 16.00 | 0.15 GB | |
Qwen 2.5 0.5B qwen2.5-0.5b-instruct-q4 | chat | 0.5B | 4.89 | 0.31 GB | |
Qwen 2.5 1.5B qwen2.5-1.5b-instruct-q4 | chat | 1.5B | 4.89 | 0.92 GB | |
Qwen 2.5 1.5B qwen2.5-1.5b-instruct-q4_k_m | chat | 1.5B | 4.89 | 0.92 GB | |
Qwen 2.5 Coder 3B qwen2.5-coder-3b-instruct-q4 | code | 3B | 4.89 | 1.83 GB | |
Qwen 3.5 4B qwen3.5-4b-instruct-q4_k_m | chat | 4B | 4.89 | 2.44 GB | |
Kyutai Moshi 7B moshi-7b | voice | 7B | 4.89 | 4.28 GB | |
Personaplex 7B personaplex-7b | voice | 7B | 4.89 | 4.28 GB | |
Qwen Omni 7B qwen-omni-7b | voice | 7B | 4.89 | 4.28 GB |
Three things the table doesn't show.
Long contexts blow up KV cache
The "largest fit" column assumes 8K context. At 128K context a 70B model's KV cache alone is ~40 GB — equal to the weights. KV-quantization halves or quarters it but doesn't eliminate it.
Memory math →Bandwidth sets speed, not fit
An M3 Pro (150 GB/s) and an M2 Pro (200 GB/s) can both load the same 13B model, but the M2 Pro decodes faster by ~25%. The fit table is silent on this; the optimization playbook is not.
Optimization playbook →The OS gets a vote
macOS reserves 25–35% of unified memory for the kernel, the GPU working set, and other apps. Locara's runtime subscribes to memory-pressure events and sheds load proactively so the user's other apps stay responsive.
macOS memory management →