Mac Hardware Lineup — Every Variant, RAM, Bandwidth, Model-Size Fit
What this is: A reference inventory of every Mac model variant from the late-Intel era (2015) through M5 (early 2026), keyed on the numbers that actually decide what local LLMs can run on it: memory capacity, memory bandwidth, memory bus width, and GPU/Neural-Engine compute class.
Why it matters: Locara apps declare device-class requirements in their manifest; the runtime computes the user’s tier and matches. To do that honestly we need a per-SKU table — not just “M3 Max” but “M3 Max 14-core / 384-bit / 300 GB/s” vs. “M3 Max 16-core / 512-bit / 400 GB/s”. The numbers also let us publish an honest “expected tok/s” alongside each device card.
Most relevant to Locara: Pairs with chip-fundamentals.md (why bandwidth is the LLM number) and modern-chip-landscape.md (cross-vendor 2026 snapshot). Also pairs with llm-memory-math.md for the formulas that turn “X GB / Y GB/s” into “model Z at Q4 runs at N tok/s”.
Caveat: Apple-published memory bandwidth is peak theoretical, derived from LPDDR clock × bus width. Real workloads typically achieve 70–85% of peak. The “expected model” column assumes leaving ~25% of RAM for the OS and other apps, and uses Q4_K_M weights at 4–8K context (the chat-app default). Long contexts blow up these numbers — see
llm-memory-math.md.
Quick reference: device classes for local LLMs
| Tier | Bandwidth | RAM | Examples | Realistic local model (Q4_K_M, ≤8K context) |
|---|---|---|---|---|
| 0 — Sub-baseline | ≤60 GB/s | ≤16 GB | Late-Intel MBA / MBP 13” | 1–3B Q4 only; mostly historical |
| 1 — Mobile baseline | ~100 GB/s | 8–24 GB | M1/M2/M3/M4 base (MBA, mini, base MBP, iMac) | 7B Q4 comfortably; 13B Q4 at the upper RAM tiers |
| 2 — Pro mobile | 150–273 GB/s | 16–64 GB | M1/M2/M3/M4 Pro | 13B Q4 comfortably; 30–34B Q4 at 36+ GB |
| 3 — Max mobile/desktop | 300–546 GB/s | 32–128 GB | M1/M2/M3/M4 Max (MBP 14”/16”, Mac Studio Max) | 70B Q4 at 64+ GB; Mixtral 8x7B Q4 at 36+ GB |
| 4 — Ultra desktop | ~800 GB/s | 64–256 GB | M1/M2/M3 Ultra (Mac Studio, Mac Pro) | 70B FP16 / 100B+ Q4 / Mixtral 8x22B Q4 |
| 5 — Frontier-local | ~800 GB/s | 512 GB | Mac Studio M3 Ultra 512 GB | Llama 3.1 405B Q4 (~250 GB) / DeepSeek-V3/R1 Q4 (~340 GB) |
The same table inverted by chip family, with detail, is below.
Late-Intel era (2015 — 2020)
The defining traits of late-Intel Macs for LLM use:
- Memory bandwidth was bottlenecked by 128-bit DDR3/LPDDR3/LPDDR4X buses at 25–60 GB/s — an order of magnitude below Apple Silicon.
- AMD Radeon Pro / Vega discrete GPUs (15” MBP, iMac, Mac Pro, iMac Pro) had separate VRAM and no Metal-Performance-Shaders-Graph path that current local LLM stacks target. ROCm doesn’t support them. MLX explicitly does not support them.
- Only two late-Intel Macs cross 128 GB of RAM: the iMac 27” Mid-2020 (up to 128 GB SO-DIMM) and the Mac Pro 2019 (up to 1.5 TB DDR4 ECC). Neither has a usable accelerated LLM path; the Mac Pro CPU-only inference is single-digit tok/s on 70B.
MacBook Air (Intel)
| Year | Chip | Cores | GPU | RAM (max) | DRAM | Bandwidth (GB/s) | Bus | Base price |
|---|---|---|---|---|---|---|---|---|
| 2015 (11” / 13”) | Core i5/i7 Broadwell | 2C/4T | Intel HD 6000 | 4–8 GB | LPDDR3-1600 | ~25.6 | 128-bit | $899 / $999 |
| 2017 (13”) | Core i5-5350U | 2C/4T | Intel HD 6000 | 8 GB | LPDDR3-1600 | ~25.6 | 128-bit | $999 |
| 2018 (Retina 13”) | Core i5-8210Y (Amber Lake-Y) | 2C/4T | Intel UHD 617 | 8–16 GB | LPDDR3-2133 | ~34.1 | 128-bit | $1,199 |
| 2019 (Retina 13”) | Core i5-8210Y | 2C/4T | Intel UHD 617 | 8–16 GB | LPDDR3-2133 | ~34.1 | 128-bit | $1,099 |
| 2020 (Retina 13”) | Core i3/i5/i7 Ice Lake | 2–4C | Iris Plus G4/G7 | 8–16 GB | LPDDR4X-3733 | ~59.7 | 128-bit | $999 |
LLM viability: Sub-3B Q4, slowly. Mostly historical interest.
MacBook Pro (Intel)
| Year | Chip | Cores | GPU | RAM (max) | DRAM | Bandwidth (GB/s) | Base price |
|---|---|---|---|---|---|---|---|
| 2015 (13” Retina) | Core i5/i7 Broadwell | 2C/4T | Iris 6100 | 16 GB | LPDDR3-1866 | ~29.8 | $1,299 |
| 2015 (15” Retina) | Core i7 Haswell (4770HQ/4870HQ/4980HQ) | 4C/8T | Iris Pro 5200 + opt. AMD R9 M370X/M390X (2 GB GDDR5) | 16 GB | DDR3L-1600 | ~25.6 | $1,999 |
| 2016/2017 (13”/15” Touch Bar) | Core i5/i7 Skylake/Kaby Lake | 2–4C | Iris 540/550 / 6700HQ + AMD Radeon Pro 450–560 | 16 GB | LPDDR3-2133 / DDR4-2400 | ~34–38 | $1,799 |
| 2018 (13”/15”) | Core i5/i7/i9 Coffee Lake | 4C/8T / 6C/12T | Iris Plus 655 + AMD Radeon Pro 555X–Vega 20 (HBM2) | 16–32 GB | LPDDR3 / DDR4-2400 | ~34–38 | $1,799 |
| 2019 (13”/15”) | Core i5/i7 Kaby Lake-R / Coffee Lake | 4C/8T / 6C/12T | Iris Plus 645 + Radeon Pro 555X/560X/Vega 16/20 | 16–32 GB | LPDDR3-2133 / DDR4-2400 | ~34–38 | $1,299 |
| 2019 (16”) | Core i7-9750H / i9-9880H / i9-9980HK Coffee Lake Refresh | 6C/12T / 8C/16T | Radeon Pro 5300M / 5500M / 5600M (4–8 GB HBM2) | 16–64 GB | DDR4-2666 | ~42.7 | $2,399 |
| 2020 (13”) | Core i5/i7 Ice Lake | 4C/8T | Iris Plus G7 | 16–32 GB | LPDDR4X-3733 | ~59.7 | $1,299 |
LLM viability: The 16” 2019 MBP with 64 GB and a Radeon Pro 5600M is the only late-Intel laptop you might consider for ~13B Q4 on CPU. Real-world tok/s is roughly 1/3–1/4 of an M1 Max for the same model. The 5600M’s HBM2 is fast on paper but has no MLX path and no usable llama.cpp Metal acceleration to rival Apple Silicon.
Mac mini / iMac / iMac Pro / Mac Pro (Intel)
| Model | Year(s) | Chip | Cores | GPU | RAM (max) | DRAM | Bandwidth (GB/s) | Bus |
|---|---|---|---|---|---|---|---|---|
| Mac mini “late 2014” | 2014–2018 | Core i5/i7 Haswell | 2C/4T | Iris 5100 | 16 GB | DDR3L-1600 | ~25.6 | 128-bit |
| Mac mini 2018 | 2018–2023 | Core i3/i5/i7 Coffee Lake | 4–6C | UHD 630 | 8–64 GB SO-DIMM (user-upgradeable) | DDR4-2666 | ~42.7 | 128-bit |
| iMac 21.5” 4K | 2015/2017 | Core i5/i7 Skylake/Kaby Lake | 2–4C | Iris Pro 6200 / Radeon Pro 555/560 | 8–32 GB | DDR3 / DDR4-2400 | ~30–38 | 128-bit |
| iMac 27” 5K | 2015 | Core i5/i7 Skylake | 4C | R9 M380–M395X | 8–32 GB SO-DIMM | DDR3-1867 | ~29.9 | 128-bit |
| iMac 27” 5K | 2017 | Core i5/i7 Kaby Lake | 4C | Radeon Pro 570/575/580 (4–8 GB GDDR5) | 8–64 GB SO-DIMM | DDR4-2400 | ~38.4 | 128-bit |
| iMac 27” 5K | 2019 | Core i5/i9 Coffee Lake | 6–8C | Radeon Pro 570X–Vega 48 (HBM2 8 GB) | 8–64 GB SO-DIMM | DDR4-2666 | ~42.7 | 128-bit |
| iMac 27” 5K | 2020 | Core i5/i7/i9 Comet Lake (up to 10C) | 6–10C | Radeon Pro 5300/5500XT/5700XT (4–16 GB GDDR6) | 8–128 GB SO-DIMM | DDR4-2666 | ~42.7 | 128-bit |
| iMac Pro | 2017–2021 | Xeon W-2140B–W-2191B Skylake-W | 8–18C | Radeon Pro Vega 56–64X (8–16 GB HBM2) | 32–256 GB ECC | DDR4-2666 ECC | ~85.3 | 256-bit (4-ch) |
| Mac Pro “trash can” | 2013–2019 | Xeon E5 Ivy Bridge-EP | 4–12C | dual FirePro D300/D500/D700 | 12–64 GB | DDR3-1866 ECC | ~59.7 | 256-bit (4-ch) |
| Mac Pro “cheese grater” | 2019–2023 | Xeon W-3223–W-3295 Cascade Lake | 8–28C | Radeon Pro 580X / Vega II / W5700X / W6800X / W6900X | 32–1.5 TB ECC RDIMM | DDR4-2933 ECC | ~140.8 | 384-bit (6-ch) |
LLM viability: Largely irrelevant for accelerated inference, but two anomalies worth knowing:
- The Mac mini 2018 with 64 GB user-upgraded DDR4 is the cheapest Intel Mac that can hold a 30B Q4 model in RAM. Speed is awful (CPU-only inference, ~40 GB/s).
- The 2020 iMac 27” with 128 GB SO-DIMM and the Mac Pro 2019 with up to 1.5 TB are the only Intel Macs that exceed today’s M3 Ultra in raw capacity — but their bandwidth and accelerator stories make them losing propositions for LLMs.
M1 family (Nov 2020 — early 2023)
Every Apple Silicon Mac has soldered LPDDR unified memory on the SoC package. CPU, GPU, and Neural Engine address the same DRAM at full bandwidth — no PCIe, no copies, no separate VRAM. This is the structural property that makes Apple Silicon disproportionately good at LLM inference.
M1 — TSMC N5
- CPU: 4P (Firestorm) + 4E (Icestorm) = 8 cores
- GPU: 7- or 8-core Apple GPU
- Neural Engine: 16 cores (~11 TOPS)
- Memory: LPDDR4X-4266, 128-bit bus → ~68.25 GB/s
- RAM options: 8 / 16 GB
| Product | Released | Discontinued | RAM | Base price |
|---|---|---|---|---|
| MacBook Air M1 | Nov 2020 | Mar 2024 | 8 / 16 GB | $999 |
| MacBook Pro 13” M1 | Nov 2020 | Oct 2022 | 8 / 16 GB | $1,299 |
| Mac mini M1 | Nov 2020 | Jan 2023 | 8 / 16 GB | $699 |
| iMac 24” M1 | Apr 2021 | Oct 2023 | 8 / 16 GB | $1,299 (4-port) |
LLM viability: 7B Q4 (~4.5 GB weights) is the sweet spot on 16 GB. 8 GB is squeezed (3B Q4 comfortable). Expect ~15–20 tok/s on 7B Q4_K_M.
M1 Pro — TSMC N5
- CPU: 6P+2E (binned, 8C) or 8P+2E (full, 10C) — Avalanche/Blizzard cores
- GPU: 14- or 16-core
- Neural Engine: 16 cores
- Memory: LPDDR5-6400, 256-bit bus → ~200 GB/s
- RAM options: 16 / 32 GB
M1 Max — TSMC N5
- CPU: 8P+2E (10 cores)
- GPU: 24- or 32-core
- Memory: LPDDR5-6400, 512-bit bus → ~400 GB/s
- RAM options: 32 / 64 GB
| Product | Released | Discontinued | Chip options | RAM | Base price |
|---|---|---|---|---|---|
| MacBook Pro 14” (2021) | Oct 2021 | Jan 2023 | M1 Pro 8C/14C-GPU or 10C/16C; M1 Max 24C/32C-GPU | 16 / 32 / 64 GB | $1,999 |
| MacBook Pro 16” (2021) | Oct 2021 | Jan 2023 | M1 Pro 10C/16C; M1 Max 24C/32C-GPU | 16 / 32 / 64 GB | $2,499 |
| Mac Studio M1 Max | Mar 2022 | Jun 2023 | M1 Max 24C/32C-GPU | 32 / 64 GB | $1,999 |
M1 Ultra — TSMC N5, two M1 Max dies via UltraFusion
- CPU: 16P+4E (20 cores)
- GPU: 48- or 64-core
- Neural Engine: 32 cores
- Memory: LPDDR5-6400, 1024-bit bus → ~800 GB/s
- RAM options: 64 / 128 GB
| Product | Released | Discontinued | Chip options | RAM | Base price |
|---|---|---|---|---|---|
| Mac Studio M1 Ultra | Mar 2022 | Jun 2023 | 48C or 64C-GPU | 64 / 128 GB | $3,999 |
LLM viability of the M1 family:
- M1 Pro 32 GB: 13B Q4 at ~20–25 tok/s.
- M1 Max 64 GB: 70B Q4 (tight at ~42 GB weights + KV cache + OS); ~7–9 tok/s.
- M1 Ultra 128 GB: 70B Q4 comfortably at ~12–14 tok/s; Mixtral 8x7B Q4 with room.
M2 family (June 2022 — Oct 2024)
M2 — TSMC N5P
- CPU: 4P+4E (8 cores)
- GPU: 8- or 10-core
- Memory: LPDDR5-6400, 128-bit → ~100 GB/s (up from M1’s 68 GB/s)
- RAM options: 8 / 16 / 24 GB (24 GB tier was new for base)
| Product | Released | RAM | Base price |
|---|---|---|---|
| MacBook Air M2 13” | Jul 2022 | 8 / 16 / 24 GB | $1,199 |
| MacBook Pro 13” M2 | Jun 2022 | 8 / 16 / 24 GB | $1,299 |
| Mac mini M2 | Jan 2023 | 8 / 16 / 24 GB | $599 |
| MacBook Air 15” M2 | Jun 2023 | 8 / 16 / 24 GB | $1,299 |
M2 Pro — TSMC N5P
- CPU: 6P+4E (10C) or 8P+4E (12C)
- GPU: 16- or 19-core
- Memory: LPDDR5-6400, 256-bit → ~200 GB/s (same as M1 Pro)
- RAM options: 16 / 32 GB
M2 Max — TSMC N5P
- CPU: 8P+4E (12C) — added 2 E-cores vs M1 Max
- GPU: 30- or 38-core
- Memory: LPDDR5-6400, 512-bit → ~400 GB/s (same as M1 Max)
- RAM options: 32 / 64 / 96 GB (96 GB was new)
| Product | Released | Chip options | RAM | Base price |
|---|---|---|---|---|
| MacBook Pro 14” (2023) | Jan 2023 | M2 Pro 10C/16C or 12C/19C; M2 Max 30C/38C-GPU | 16 / 32 / 64 / 96 GB | $1,999 |
| MacBook Pro 16” (2023) | Jan 2023 | M2 Pro 12C/19C; M2 Max 30C/38C-GPU | 16 / 32 / 64 / 96 GB | $2,499 |
| Mac mini M2 Pro | Jan 2023 | 10C/16C or 12C/19C | 16 / 32 GB | $1,299 |
| Mac Studio M2 Max | Jun 2023 | M2 Max 30C or 38C-GPU | 32 / 64 / 96 GB | $1,999 |
M2 Ultra — TSMC N5P, two M2 Max via UltraFusion
- CPU: 16P+8E (24C)
- GPU: 60- or 76-core
- Neural Engine: 32 cores
- Memory: LPDDR5-6400, 1024-bit → ~800 GB/s
- RAM options: 64 / 128 / 192 GB (192 GB was the headline)
| Product | Released | Chip options | RAM | Base price |
|---|---|---|---|---|
| Mac Studio M2 Ultra | Jun 2023 | 60C or 76C-GPU | 64 / 128 / 192 GB | $3,999 |
| Mac Pro M2 Ultra | Jun 2023 | 60C or 76C-GPU | 64 / 128 / 192 GB | $6,999 |
Mac Pro M2 Ultra is essentially a Mac Studio in a tower. Same chip, same RAM cap, same bandwidth. The only differentiator is PCIe expansion, and PCIe-attached GPUs cannot share unified memory — they have no MLX path and limited llama.cpp Metal path. For LLM work the Mac Pro M2 Ultra is a strict downgrade in value vs. the Mac Studio M2 Ultra.
LLM viability of the M2 family:
- M2 base 24 GB: 13B Q4 just barely; 7B Q4 comfortably at ~22 tok/s.
- M2 Pro 32 GB: 13B Q4 at ~30 tok/s; 30B Q4 squeezed.
- M2 Max 96 GB: 70B Q4 at ~10–13 tok/s; Mixtral 8x7B Q4 comfortably.
- M2 Ultra 192 GB: 70B FP16 (140 GB) at ~5–7 tok/s; Mixtral 8x22B Q4 (~80 GB) comfortably.
M3 family (Oct 2023 — Oct 2024; M3 Ultra arrived March 2025)
M3 — TSMC N3B (first-gen 3 nm)
- CPU: 4P+4E (8C)
- GPU: 8- or 10-core — first Apple GPU with hardware ray tracing and mesh shading
- Memory: LPDDR5-6400, 128-bit → ~100 GB/s (unchanged from M2)
- RAM options: 8 / 16 / 24 GB
| Product | Released | RAM | Base price |
|---|---|---|---|
| MacBook Pro 14” M3 | Nov 2023 | 8 / 16 / 24 GB | $1,599 |
| iMac 24” M3 | Nov 2023 | 8 / 16 / 24 GB | $1,299 |
| MacBook Air 13”/15” M3 | Mar 2024 | 8 / 16 / 24 GB | $1,099 / $1,299 |
M3 Pro — TSMC N3B (the controversial one)
- CPU: 5P+6E (11C) or 6P+6E (12C) — Apple shifted toward more efficiency cores
- GPU: 14- or 18-core
- Memory: LPDDR5-6400, 192-bit → ~150 GB/s — DOWN from M2 Pro’s 200 GB/s
- RAM options: 18 / 36 GB — unusual numbers reflecting the narrower 192-bit bus
The M3 Pro is a regression for LLM users. A narrower memory bus (192-bit vs M2 Pro’s 256-bit) plus same LPDDR5 clock means ~25% less bandwidth. Documented extensively by Chips and Cheese (“Apple’s M3 Pro: A Step Sideways”, Nov 2023), Vadim Yuryev (MaxTech), and r/LocalLLaMA community measurements. The Apple-side rationale (per supply-chain reporting) was N3B yield economics — cutting bus width saves die area. M4 Pro restored and then improved the bus.
M3 Max — TSMC N3B (two distinct bandwidth tiers under one name)
The M3 Max shipped in two memory configurations, depending on which CPU bin you got:
| M3 Max variant | CPU | GPU | RAM options | Bandwidth | Bus |
|---|---|---|---|---|---|
| Binned (14-core) | 10P+4E | 30-core | 36 / 96 GB | ~300 GB/s | 384-bit |
| Full (16-core) | 12P+4E | 40-core | 48 / 64 / 128 GB | ~400 GB/s | 512-bit |
Same chip name, different memory subsystems. Buyers who ordered 36 GB automatically got the binned (slower) variant; 64 GB or 128 GB orders got the full variant.
| Product | Released | Chip options | RAM | Base price |
|---|---|---|---|---|
| MacBook Pro 14” M3 Pro/Max | Nov 2023 | Pro 11C/14C-GPU or 12C/18C-GPU; Max 14C/30C-GPU or 16C/40C-GPU | 18 / 36 / 48 / 64 / 96 / 128 GB | $1,999 |
| MacBook Pro 16” M3 Pro/Max | Nov 2023 | Same | 18 / 36 / 48 / 64 / 96 / 128 GB | $2,499 |
M3 Ultra — TSMC N3B, two M3 Max via UltraFusion (March 2025)
Apple skipped an M3 Ultra in the original M3 lineup and released it only in March 2025 — after M4 had already shipped in iPads and Macs. The headline was the 512 GB unified memory option, exclusive to M3 Ultra; M4 has no Ultra tier as of this writing.
- CPU: 24P+8E (32C) — two full M3 Max dies
- GPU: 60- or 80-core
- Neural Engine: 32 cores
- Memory: LPDDR5-6400, 1024-bit → ~800 GB/s
- RAM options: 96 / 256 / 512 GB
| Product | Released | Chip options | RAM | Base price |
|---|---|---|---|---|
| Mac Studio M3 Ultra | Mar 2025 | 60C-GPU (96/256 GB) or 80C-GPU (96/256/512 GB) | 96 / 256 / 512 GB | $3,999 / $5,499 (80C base) / ~$9,500+ (512 GB) |
The 512 GB Mac Studio M3 Ultra is the highest-RAM consumer machine ever sold by any vendor. No PC platform, including AMD Strix Halo (128 GB cap) or any single-GPU rig (RTX 5090 caps at 32 GB GDDR7), comes close at consumer pricing. Reportedly runs Llama 3.1 405B Q4 (~250 GB) at ~2 tok/s and DeepSeek-V3/R1 Q4 (~340 GB) at usable interactive speeds for a single user.
LLM viability of the M3 family:
- M3 base 24 GB: 7B Q4 at ~20–25 tok/s; 13B Q4 tight.
- M3 Pro 36 GB: regressed vs M2 Pro on bandwidth-bound generation; ~20–25 tok/s on 13B Q4, where M2 Pro hits 25–30.
- M3 Max 14C/36 GB: ~50–60 tok/s on 7B Q4; 30B Q4 fits with room.
- M3 Max 16C/128 GB: 70B Q4 at ~10–14 tok/s.
- M3 Ultra 256 GB: 70B FP16 at ~7–9 tok/s; Mixtral 8x22B Q4 with massive headroom.
- M3 Ultra 512 GB: only consumer device that runs Llama 3.1 405B Q4 or DeepSeek-V3 Q4 in unified memory.
M4 family (May 2024 iPad Pro; Oct 2024 Macs)
The M4 broke from N3B and uses TSMC N3E — a more mature 3 nm variant. Memory moved to LPDDR5X (8533 MT/s vs M1–M3’s LPDDR5 at 6400), which is the headline bandwidth driver.
M4 — TSMC N3E
- CPU: 4P+6E (10C) — added 2 E-cores
- GPU: 8- or 10-core (Dynamic Caching + hardware RT/mesh shading carried over from M3)
- Neural Engine: 16 cores (uplifted throughput, Apple quotes ~38 TOPS)
- Memory: LPDDR5X, 128-bit → ~120 GB/s
- RAM options: 16 / 24 / 32 GB — 8 GB base finally retired
| Product | Released | RAM | Base price |
|---|---|---|---|
| iPad Pro M4 | May 2024 | 8 / 16 GB | — |
| MacBook Pro 14” M4 | Oct 2024 | 16 / 24 / 32 GB | $1,599 |
| iMac 24” M4 | Oct 2024 | 16 / 24 / 32 GB | $1,299 |
| Mac mini M4 | Oct 2024 | 16 / 24 / 32 GB | $599 |
| MacBook Air 13”/15” M4 | Mar 2025 | 16 / 24 / 32 GB | $999 / $1,199 |
M4 Pro — TSMC N3E (memory bandwidth restored, then some)
After the M3 Pro controversy, Apple widened the bus and moved to LPDDR5X:
- CPU: 8P+4E (12C) or 10P+4E (14C)
- GPU: 16- or 20-core
- Memory: LPDDR5X-8533, 256-bit → ~273 GB/s (up from M3 Pro’s 150 GB/s, also up from M2 Pro’s 200 GB/s)
- RAM options: 24 / 48 / 64 GB
M4 Max — TSMC N3E (two bandwidth tiers again)
| M4 Max variant | CPU | GPU | RAM options | Bandwidth | Bus |
|---|---|---|---|---|---|
| Binned (14-core) | 10P+4E | 32-core | 36 GB only | ~410 GB/s | 384-bit |
| Full (16-core) | 12P+4E | 40-core | 48 / 64 / 128 GB | ~546 GB/s | 512-bit |
The full M4 Max at 546 GB/s is the biggest generational bandwidth jump in M-series history — ~37% over M3 Max full.
| Product | Released | Chip options | RAM | Base price |
|---|---|---|---|---|
| MacBook Pro 14” M4 Pro/Max | Oct 2024 | Pro 12C/16C-GPU or 14C/20C; Max 14C/32C-GPU or 16C/40C | 24 / 36 / 48 / 64 / 128 GB | $1,999 |
| MacBook Pro 16” M4 Pro/Max | Oct 2024 | Same | 24 / 48 / 64 / 128 GB | $2,499 |
| Mac mini M4 Pro | Oct 2024 | Pro 12C/16C or 14C/20C | 24 / 48 / 64 GB | $1,399 |
| Mac Studio M4 Max | Mar 2025 | Max 14C/32C-GPU or 16C/40C | 36 / 48 / 64 / 128 GB | $1,999 |
M4 Ultra — does not ship as of early 2026
When Apple refreshed Mac Studio in March 2025, they paired the M4 Max with the M3 Ultra — a deliberate split-generation product. Per Bloomberg’s Mark Gurman and multiple supply-chain leaks, the M4 Max die does not include the UltraFusion interconnect needed to fuse two dies. Apple appears to have decided early in M4’s design that the Ultra tier would be skipped; the 512 GB slot is held by M3 Ultra until M5 Ultra (if any) arrives.
Note: the existing
modern-chip-landscape.mdrefers to an “M4 Ultra in Mac Studio” — that’s incorrect. The Mac Studio Ultra ships M3 Ultra, not M4 Ultra. Worth correcting next time that note is revised.
LLM viability of the M4 family:
- M4 base 32 GB: 13B Q4 comfortably; 7B Q4 at ~25–30 tok/s.
- M4 Pro 64 GB: 30B Q4 comfortably; 70B Q4 squeezed; ~35–40 tok/s on 13B Q4.
- M4 Max 128 GB: 70B Q4 at ~14–18 tok/s (best mobile-Mac numbers).
M5 family (announced/launched late 2025 — early 2026)
Confidence: medium for shipped parts, low for unannounced.
M5 — TSMC N3P
The marquee M5 architectural change: neural accelerators inside each GPU core — Apple’s name for matrix-multiply hardware embedded in the GPU itself, conceptually similar to NVIDIA Tensor Cores. This means MLX and llama.cpp Metal-backend inference (already GPU-routed) gets a substantial uplift on prompt-processing without the Neural Engine’s operator constraints.
- CPU: 4P+6E (refined cores)
- GPU: new-architecture with embedded neural accelerators
- Memory: LPDDR5X-9600, 128-bit → ~150 GB/s
- RAM options: 16 / 24 / 32 GB
| Product | Released | RAM | Notes |
|---|---|---|---|
| MacBook Pro 14” M5 | Oct 2025 | 16 / 24 / 32 GB | |
| iPad Pro M5 | Oct 2025 | 12 / 16 GB | |
| MacBook Air M5 | expected Spring 2026 | 16 / 24 / 32 GB |
M5 Pro / Max / Ultra — not confirmed at time of writing
Rumored for spring/summer 2026 with continued bandwidth uplift. Treat any specific numbers as speculation. If/when an M5 Ultra arrives with LPDDR5X-9600 on the same 1024-bit bus, that’s an automatic ~50% bandwidth jump from M3 Ultra to ~1.2 TB/s — the first Ultra-tier bandwidth movement in four generations.
Early M5 LLM impact: community benchmarks suggest ~2–4× prompt-processing throughput vs M4 at the same memory bandwidth (the GPU-embedded matmul units carry the load). Decode (bandwidth-bound) sees a more modest ~25% uplift matching the bandwidth gain.
Notable surprises and learnings
1. The M3 Pro bandwidth regression was a real product mistake. 150 GB/s vs M2 Pro’s 200 GB/s. The unusual 18 / 36 GB capacities exist because of the 192-bit bus (six DRAM channels vs four wider ones). M4 Pro’s 273 GB/s on 256-bit LPDDR5X is the correction. If a user has an M3 Pro, expect them to be measurably slower than the equivalent M2 Pro on bandwidth-bound LLM decode — surprising but real.
2. Both M3 Max and M4 Max ship in two bandwidth tiers under one name. Buying 36 GB on a Max-tier Mac gets you the binned die (384-bit bus, lower bandwidth). Buying 48 GB or above gets the full die (512-bit). The bandwidth delta — 300 vs 400 on M3, 410 vs 546 on M4 — is material for LLM inference. Locara device cards should distinguish these explicitly.
3. Ultra-tier bandwidth has been flat at ~800 GB/s for four generations. M1, M2, M3 Ultra all sit at 800 GB/s on 1024-bit LPDDR5-6400. Apple has not moved Ultra-tier to LPDDR5X. The bandwidth scales only via die fusion, not memory speed. An M5 Ultra with LPDDR5X-9600 on the same bus would be the first real Ultra-tier bandwidth jump.
4. M3 Ultra 512 GB is the consumer-hardware capacity ceiling. No other shipping consumer machine — AMD Strix Halo (128 GB cap), any single-GPU rig (RTX 5090 caps at 32 GB GDDR7) — comes close at consumer pricing. The next viable option above 512 GB is a workstation with multiple GPUs or a server platform, both 5–20× the cost. This is genuinely a one-of-a-kind product as of early 2026.
5. The Intel Mac Pro 2019 had more RAM than any Apple Silicon Mac until March 2025. Intel Mac Pro: 1.5 TB DDR4 ECC, ~140 GB/s. M3 Ultra: 512 GB LPDDR5, ~800 GB/s. Apple Silicon traded raw capacity for ~6× bandwidth and unified addressing — the structurally correct call for what matters in inference.
6. Apple’s published bandwidth is peak theoretical. Real LLM inference typically achieves 70–85% of peak. Vadim Yuryev (MaxTech) and the now-defunct AnandTech both consistently noted this in their reviews. Use the published numbers as upper bounds.
7. The Neural Engine has been remarkably static through M4. 16 cores from M1 through M4 (32 on Ultras). Per-core throughput has improved (~11 → ~38 TOPS quoted) but local-LLM stacks (MLX, llama.cpp Metal) bypass the Neural Engine entirely and target the GPU, because the Neural Engine has tight operator-support constraints. M5’s GPU-embedded neural accelerators are the first real architectural change that local LLM stacks can use directly.
8. There is no Apple Silicon Mac with user-upgradeable RAM. Every M-series Mac has soldered LPDDR. The Intel-era Mac mini 2018, iMac 27” 2020, and Mac Pro 2019 are the last user-upgradeable-RAM Macs. Users who under-buy RAM at purchase time are stuck for the life of the machine — this is the single most common Locara-relevant deployment mistake. App manifests need to fail loud and early when the user’s machine is undersized.
9. The Mac Pro M2 Ultra is effectively obsolete for LLM work. Same chip, same RAM cap (192 GB), same bandwidth as the Mac Studio M2 Ultra; PCIe expansion can’t host LLM-relevant GPUs (no MLX, weak Metal path). A Mac Studio M2 Ultra at $3,999 strictly dominates a Mac Pro M2 Ultra at $6,999 for local AI.
10. iPad Pro M4 has more memory bandwidth than any Intel Mac. ~120 GB/s vs Intel Mac Pro’s ~140 GB/s on paper, but with unified memory and MLX support — iPad Pro M4 with 16 GB runs 7–13B Q4 models faster than any pre-2020 Intel Mac, in a tablet. Same chip as MacBook Air M4 16 GB. The mobile-class compute ceiling has moved.
Specific learnings for Locara
-
Manifest device-class targeting needs both RAM and bandwidth. A 64 GB M2 Max and a 64 GB M3 Max-14C have very different LLM performance even though they have the same RAM tier. The manifest schema should accept either a coarse tier (“Tier 3”) or a specific (RAM, bandwidth) pair, with the runtime computing the user’s actual numbers via
sysctl hw.memsizeand a chip-keyed bandwidth lookup table. -
Distinguish M3 Max 14C from M3 Max 16C, and M4 Max 14C from M4 Max 16C. Same chip name, different bandwidth. Locara’s device detection should read
sysctl machdep.cpu.brand_stringandsysctl hw.perflevel0.physicalcputo disambiguate, then cross-reference with this note’s table. -
The M3 Pro regression is a real datapoint for the manifest. A 36 GB M3 Pro is slower on bandwidth-bound LLM decode than a 32 GB M2 Pro despite more RAM. App authors should be steered toward setting a bandwidth floor, not just a RAM floor, if their app does long-form generation.
-
Soldered RAM means hardware tier is a permanent property of the user, not a runtime variable. This is unlike a PC where the user can add RAM. Locara should remember the user’s machine tier as a persistent profile attribute and warn at install time, not at runtime, when an app exceeds that tier.
-
The Mac Studio Ultra is the LLM-first hardware product Apple ships. Locara’s “what’s the most ambitious app you can ship?” question is bounded by what Mac Studio M3 Ultra 512 GB can run — DeepSeek-V3-Q4 (~340 GB) is the current ceiling. Apps targeting “the most demanding user” should be built knowing this is the platform.
-
Mac mini M4 is the right value-tier deployment target. $599 base, 16 GB RAM, ~120 GB/s, runs 7B Q4 well. Locara’s reference apps and onboarding flow should be tuned to “the Mac mini M4 user,” because that’s the price/capability sweet spot for new local-AI adopters.
-
MacBook Air remains the volume-weighted target. Anyone buying a Mac for casual use buys an Air. M1 16 GB, M2 16/24 GB, M3 16/24 GB, M4 16/24/32 GB Airs collectively dominate the install base. Apps targeting “the median Locara user” must work well at 16 GB / ~100–120 GB/s — that’s the binding constraint for v1 reference apps.
-
Don’t trust the “GB/s” Apple publishes as a real performance number. Multiply by ~0.75 utilization for honest tok/s estimates. The full formula is in
llm-memory-math.md. -
Treat Intel Macs as out-of-scope for v1. Even the iMac Pro and Mac Pro 2019 are 1/3 to 1/5 the speed of an entry M-series Mac on LLM inference, with no MLX path. Locara v1 should refuse to install on Intel Macs with a clear “your Mac doesn’t have the unified-memory architecture this app requires” message.
-
A bandwidth-keyed model picker is the right manifest primitive. Given the user’s measured bandwidth, the runtime can publish “Llama 3 8B Q4 at expected ~X tok/s on your Mac” as the install-time gate. Honesty about expected performance is the LSB of trust.
References
Apple primary sources (tech specs and announcements):
- Apple tech specs archive (every model):
https://support.apple.com/specs/ - Mac Studio:
https://www.apple.com/mac-studio/specs/ - MacBook Pro:
https://www.apple.com/macbook-pro/specs/ - MacBook Air:
https://www.apple.com/macbook-air/specs/ - Mac mini:
https://www.apple.com/mac-mini/specs/ - iMac:
https://www.apple.com/imac/specs/ - Apple Newsroom (launch press releases with confirmed dates and prices):
https://www.apple.com/newsroom/ - M3 Ultra announcement (Mar 2025):
https://www.apple.com/newsroom/2025/03/apple-reveals-m3-ultra-taking-apple-silicon-to-a-new-extreme/
Microarchitecture deep dives:
- AnandTech (Andrei Frumusanu, Ryan Smith) — M1 / M1 Pro / M1 Max die analyses (Oct 2021) and A14/A15/A16 Firestorm-Avalanche-Everest core work. Site ceased active publication August 2024; archives still at anandtech.com.
- Chips and Cheese (
https://chipsandcheese.com) — “Apple’s M3 Pro: A Step Sideways” (Nov 2023) documented the M3 Pro bandwidth regression with measured numbers. Multiple M2 Max / M4 Max deep dives. - SemiAnalysis (Dylan Patel) — TSMC N3B vs N3E yield economics, Apple’s process-node transition timing.
- TechInsights — die-shot analyses for each M-series generation.
- Hot Chips conference papers — Apple has presented some M-series details (e.g., M1 at Hot Chips 33).
Reviews and measured performance:
- The Verge (Nilay Patel, Monica Chin) — reviews for every major Mac launch since 2015.
- Ars Technica (Andrew Cunningham, Samuel Axon) — particularly strong on Mac mini and Mac Studio with real-workload focus.
- Notebookcheck (
https://notebookcheck.net) — benchmark database for every Mac with comparable scores. - MaxTech / Vadim Yuryev (YouTube) — consistent Geekbench Memory and llama.cpp benchmarks on every new Mac.
- MKBHD / Marques Brownlee (YouTube) — spec breakdowns and side-by-side reviews.
- AlexZiskind (YouTube) — the most rigorous Mac LLM benchmarking, with thermal and sustained-vs-burst breakdowns.
LLM-specific Mac benchmarks:
- r/LocalLLaMA (
https://reddit.com/r/LocalLLaMA) — single best aggregator for community-measured tok/s per Mac SKU. Search for specific chip names and quants. - llama.cpp issue tracker — Apple Silicon performance discussions and bandwidth-vs-tok/s charts:
https://github.com/ggerganov/llama.cpp/issues - MLX repo + issues — Apple’s official LLM stack:
https://github.com/ml-explore/mlx - Awni Hannun (MLX lead) — Twitter
@awnihannun, blog posts on MLX benchmarks. - Simon Willison (
simonwillison.net) — practical Mac LLM write-ups with measurements on M2 Max and M3 Ultra.
WWDC sessions:
- WWDC 2020 “Explore the new system architecture of Apple silicon Macs” — first official UMA description.
- WWDC 2023 / 2024 / 2025 Metal and ML sessions — MLX, Neural Engine, M5 GPU neural accelerators.
Source caveats:
- “M4 Ultra exists” — does not as of early 2026. The Mac Studio (Mar 2025) pairs M4 Max with M3 Ultra. Earlier
modern-chip-landscape.mdreference to M4 Ultra in Mac Studio is incorrect. - M5 Pro / Max / Ultra specs — speculative; not shipping as of writing.
- LPDDR5X clock for M4 — sources differ on whether base M4 uses 7500 or 8533 MT/s; Apple’s published 120 GB/s suggests a lower clock on base than on Pro/Max tiers.
- Mac Pro M3 Ultra — has not shipped; Mac Pro remains on M2 Ultra as of writing.