Mac Hardware Lineup — Every Variant, RAM, Bandwidth, Model-Size Fit

What this is: A reference inventory of every Mac model variant from the late-Intel era (2015) through M5 (early 2026), keyed on the numbers that actually decide what local LLMs can run on it: memory capacity, memory bandwidth, memory bus width, and GPU/Neural-Engine compute class. Why it matters: Locara apps declare device-class requirements in their manifest; the runtime computes the user’s tier and matches. To do that honestly we need a per-SKU table — not just “M3 Max” but “M3 Max 14-core / 384-bit / 300 GB/s” vs. “M3 Max 16-core / 512-bit / 400 GB/s”. The numbers also let us publish an honest “expected tok/s” alongside each device card. Most relevant to Locara: Pairs with chip-fundamentals.md (why bandwidth is the LLM number) and modern-chip-landscape.md (cross-vendor 2026 snapshot). Also pairs with llm-memory-math.md for the formulas that turn “X GB / Y GB/s” into “model Z at Q4 runs at N tok/s”.

Caveat: Apple-published memory bandwidth is peak theoretical, derived from LPDDR clock × bus width. Real workloads typically achieve 70–85% of peak. The “expected model” column assumes leaving ~25% of RAM for the OS and other apps, and uses Q4_K_M weights at 4–8K context (the chat-app default). Long contexts blow up these numbers — see llm-memory-math.md.

Quick reference: device classes for local LLMs

Tier	Bandwidth	RAM	Examples	Realistic local model (Q4_K_M, ≤8K context)
0 — Sub-baseline	≤60 GB/s	≤16 GB	Late-Intel MBA / MBP 13”	1–3B Q4 only; mostly historical
1 — Mobile baseline	~100 GB/s	8–24 GB	M1/M2/M3/M4 base (MBA, mini, base MBP, iMac)	7B Q4 comfortably; 13B Q4 at the upper RAM tiers
2 — Pro mobile	150–273 GB/s	16–64 GB	M1/M2/M3/M4 Pro	13B Q4 comfortably; 30–34B Q4 at 36+ GB
3 — Max mobile/desktop	300–546 GB/s	32–128 GB	M1/M2/M3/M4 Max (MBP 14”/16”, Mac Studio Max)	70B Q4 at 64+ GB; Mixtral 8x7B Q4 at 36+ GB
4 — Ultra desktop	~800 GB/s	64–256 GB	M1/M2/M3 Ultra (Mac Studio, Mac Pro)	70B FP16 / 100B+ Q4 / Mixtral 8x22B Q4
5 — Frontier-local	~800 GB/s	512 GB	Mac Studio M3 Ultra 512 GB	Llama 3.1 405B Q4 (~250 GB) / DeepSeek-V3/R1 Q4 (~340 GB)

The same table inverted by chip family, with detail, is below.

Late-Intel era (2015 — 2020)

The defining traits of late-Intel Macs for LLM use:

Memory bandwidth was bottlenecked by 128-bit DDR3/LPDDR3/LPDDR4X buses at 25–60 GB/s — an order of magnitude below Apple Silicon.
AMD Radeon Pro / Vega discrete GPUs (15” MBP, iMac, Mac Pro, iMac Pro) had separate VRAM and no Metal-Performance-Shaders-Graph path that current local LLM stacks target. ROCm doesn’t support them. MLX explicitly does not support them.
Only two late-Intel Macs cross 128 GB of RAM: the iMac 27” Mid-2020 (up to 128 GB SO-DIMM) and the Mac Pro 2019 (up to 1.5 TB DDR4 ECC). Neither has a usable accelerated LLM path; the Mac Pro CPU-only inference is single-digit tok/s on 70B.

MacBook Air (Intel)

Year	Chip	Cores	GPU	RAM (max)	DRAM	Bandwidth (GB/s)	Bus	Base price
2015 (11” / 13”)	Core i5/i7 Broadwell	2C/4T	Intel HD 6000	4–8 GB	LPDDR3-1600	~25.6	128-bit	$899 / $999
2017 (13”)	Core i5-5350U	2C/4T	Intel HD 6000	8 GB	LPDDR3-1600	~25.6	128-bit	$999
2018 (Retina 13”)	Core i5-8210Y (Amber Lake-Y)	2C/4T	Intel UHD 617	8–16 GB	LPDDR3-2133	~34.1	128-bit	$1,199
2019 (Retina 13”)	Core i5-8210Y	2C/4T	Intel UHD 617	8–16 GB	LPDDR3-2133	~34.1	128-bit	$1,099
2020 (Retina 13”)	Core i3/i5/i7 Ice Lake	2–4C	Iris Plus G4/G7	8–16 GB	LPDDR4X-3733	~59.7	128-bit	$999

LLM viability: Sub-3B Q4, slowly. Mostly historical interest.

MacBook Pro (Intel)

Year	Chip	Cores	GPU	RAM (max)	DRAM	Bandwidth (GB/s)	Base price
2015 (13” Retina)	Core i5/i7 Broadwell	2C/4T	Iris 6100	16 GB	LPDDR3-1866	~29.8	$1,299
2015 (15” Retina)	Core i7 Haswell (4770HQ/4870HQ/4980HQ)	4C/8T	Iris Pro 5200 + opt. AMD R9 M370X/M390X (2 GB GDDR5)	16 GB	DDR3L-1600	~25.6	$1,999
2016/2017 (13”/15” Touch Bar)	Core i5/i7 Skylake/Kaby Lake	2–4C	Iris 540/550 / 6700HQ + AMD Radeon Pro 450–560	16 GB	LPDDR3-2133 / DDR4-2400	~34–38	$1,799
2018 (13”/15”)	Core i5/i7/i9 Coffee Lake	4C/8T / 6C/12T	Iris Plus 655 + AMD Radeon Pro 555X–Vega 20 (HBM2)	16–32 GB	LPDDR3 / DDR4-2400	~34–38	$1,799
2019 (13”/15”)	Core i5/i7 Kaby Lake-R / Coffee Lake	4C/8T / 6C/12T	Iris Plus 645 + Radeon Pro 555X/560X/Vega 16/20	16–32 GB	LPDDR3-2133 / DDR4-2400	~34–38	$1,299
2019 (16”)	Core i7-9750H / i9-9880H / i9-9980HK Coffee Lake Refresh	6C/12T / 8C/16T	Radeon Pro 5300M / 5500M / 5600M (4–8 GB HBM2)	16–64 GB	DDR4-2666	~42.7	$2,399
2020 (13”)	Core i5/i7 Ice Lake	4C/8T	Iris Plus G7	16–32 GB	LPDDR4X-3733	~59.7	$1,299

LLM viability: The 16” 2019 MBP with 64 GB and a Radeon Pro 5600M is the only late-Intel laptop you might consider for ~13B Q4 on CPU. Real-world tok/s is roughly 1/3–1/4 of an M1 Max for the same model. The 5600M’s HBM2 is fast on paper but has no MLX path and no usable llama.cpp Metal acceleration to rival Apple Silicon.

Mac mini / iMac / iMac Pro / Mac Pro (Intel)

Model	Year(s)	Chip	Cores	GPU	RAM (max)	DRAM	Bandwidth (GB/s)	Bus
Mac mini “late 2014”	2014–2018	Core i5/i7 Haswell	2C/4T	Iris 5100	16 GB	DDR3L-1600	~25.6	128-bit
Mac mini 2018	2018–2023	Core i3/i5/i7 Coffee Lake	4–6C	UHD 630	8–64 GB SO-DIMM (user-upgradeable)	DDR4-2666	~42.7	128-bit
iMac 21.5” 4K	2015/2017	Core i5/i7 Skylake/Kaby Lake	2–4C	Iris Pro 6200 / Radeon Pro 555/560	8–32 GB	DDR3 / DDR4-2400	~30–38	128-bit
iMac 27” 5K	2015	Core i5/i7 Skylake	4C	R9 M380–M395X	8–32 GB SO-DIMM	DDR3-1867	~29.9	128-bit
iMac 27” 5K	2017	Core i5/i7 Kaby Lake	4C	Radeon Pro 570/575/580 (4–8 GB GDDR5)	8–64 GB SO-DIMM	DDR4-2400	~38.4	128-bit
iMac 27” 5K	2019	Core i5/i9 Coffee Lake	6–8C	Radeon Pro 570X–Vega 48 (HBM2 8 GB)	8–64 GB SO-DIMM	DDR4-2666	~42.7	128-bit
iMac 27” 5K	2020	Core i5/i7/i9 Comet Lake (up to 10C)	6–10C	Radeon Pro 5300/5500XT/5700XT (4–16 GB GDDR6)	8–128 GB SO-DIMM	DDR4-2666	~42.7	128-bit
iMac Pro	2017–2021	Xeon W-2140B–W-2191B Skylake-W	8–18C	Radeon Pro Vega 56–64X (8–16 GB HBM2)	32–256 GB ECC	DDR4-2666 ECC	~85.3	256-bit (4-ch)
Mac Pro “trash can”	2013–2019	Xeon E5 Ivy Bridge-EP	4–12C	dual FirePro D300/D500/D700	12–64 GB	DDR3-1866 ECC	~59.7	256-bit (4-ch)
Mac Pro “cheese grater”	2019–2023	Xeon W-3223–W-3295 Cascade Lake	8–28C	Radeon Pro 580X / Vega II / W5700X / W6800X / W6900X	32–1.5 TB ECC RDIMM	DDR4-2933 ECC	~140.8	384-bit (6-ch)

LLM viability: Largely irrelevant for accelerated inference, but two anomalies worth knowing:

The Mac mini 2018 with 64 GB user-upgraded DDR4 is the cheapest Intel Mac that can hold a 30B Q4 model in RAM. Speed is awful (CPU-only inference, ~40 GB/s).
The 2020 iMac 27” with 128 GB SO-DIMM and the Mac Pro 2019 with up to 1.5 TB are the only Intel Macs that exceed today’s M3 Ultra in raw capacity — but their bandwidth and accelerator stories make them losing propositions for LLMs.

M1 family (Nov 2020 — early 2023)

Every Apple Silicon Mac has soldered LPDDR unified memory on the SoC package. CPU, GPU, and Neural Engine address the same DRAM at full bandwidth — no PCIe, no copies, no separate VRAM. This is the structural property that makes Apple Silicon disproportionately good at LLM inference.

M1 — TSMC N5

CPU: 4P (Firestorm) + 4E (Icestorm) = 8 cores
GPU: 7- or 8-core Apple GPU
Neural Engine: 16 cores (~11 TOPS)
Memory: LPDDR4X-4266, 128-bit bus → ~68.25 GB/s
RAM options: 8 / 16 GB

Product	Released	Discontinued	RAM	Base price
MacBook Air M1	Nov 2020	Mar 2024	8 / 16 GB	$999
MacBook Pro 13” M1	Nov 2020	Oct 2022	8 / 16 GB	$1,299
Mac mini M1	Nov 2020	Jan 2023	8 / 16 GB	$699
iMac 24” M1	Apr 2021	Oct 2023	8 / 16 GB	$1,299 (4-port)

LLM viability: 7B Q4 (~4.5 GB weights) is the sweet spot on 16 GB. 8 GB is squeezed (3B Q4 comfortable). Expect ~15–20 tok/s on 7B Q4_K_M.

M1 Pro — TSMC N5

CPU: 6P+2E (binned, 8C) or 8P+2E (full, 10C) — Avalanche/Blizzard cores
GPU: 14- or 16-core
Neural Engine: 16 cores
Memory: LPDDR5-6400, 256-bit bus → ~200 GB/s
RAM options: 16 / 32 GB

M1 Max — TSMC N5

CPU: 8P+2E (10 cores)
GPU: 24- or 32-core
Memory: LPDDR5-6400, 512-bit bus → ~400 GB/s
RAM options: 32 / 64 GB

Product	Released	Discontinued	Chip options	RAM	Base price
MacBook Pro 14” (2021)	Oct 2021	Jan 2023	M1 Pro 8C/14C-GPU or 10C/16C; M1 Max 24C/32C-GPU	16 / 32 / 64 GB	$1,999
MacBook Pro 16” (2021)	Oct 2021	Jan 2023	M1 Pro 10C/16C; M1 Max 24C/32C-GPU	16 / 32 / 64 GB	$2,499
Mac Studio M1 Max	Mar 2022	Jun 2023	M1 Max 24C/32C-GPU	32 / 64 GB	$1,999

M1 Ultra — TSMC N5, two M1 Max dies via UltraFusion

CPU: 16P+4E (20 cores)
GPU: 48- or 64-core
Neural Engine: 32 cores
Memory: LPDDR5-6400, 1024-bit bus → ~800 GB/s
RAM options: 64 / 128 GB

Product	Released	Discontinued	Chip options	RAM	Base price
Mac Studio M1 Ultra	Mar 2022	Jun 2023	48C or 64C-GPU	64 / 128 GB	$3,999

LLM viability of the M1 family:

M1 Pro 32 GB: 13B Q4 at ~20–25 tok/s.
M1 Max 64 GB: 70B Q4 (tight at ~42 GB weights + KV cache + OS); ~7–9 tok/s.
M1 Ultra 128 GB: 70B Q4 comfortably at ~12–14 tok/s; Mixtral 8x7B Q4 with room.

M2 family (June 2022 — Oct 2024)

M2 — TSMC N5P

CPU: 4P+4E (8 cores)
GPU: 8- or 10-core
Memory: LPDDR5-6400, 128-bit → ~100 GB/s (up from M1’s 68 GB/s)
RAM options: 8 / 16 / 24 GB (24 GB tier was new for base)

Product	Released	RAM	Base price
MacBook Air M2 13”	Jul 2022	8 / 16 / 24 GB	$1,199
MacBook Pro 13” M2	Jun 2022	8 / 16 / 24 GB	$1,299
Mac mini M2	Jan 2023	8 / 16 / 24 GB	$599
MacBook Air 15” M2	Jun 2023	8 / 16 / 24 GB	$1,299

M2 Pro — TSMC N5P

CPU: 6P+4E (10C) or 8P+4E (12C)
GPU: 16- or 19-core
Memory: LPDDR5-6400, 256-bit → ~200 GB/s (same as M1 Pro)
RAM options: 16 / 32 GB

M2 Max — TSMC N5P

CPU: 8P+4E (12C) — added 2 E-cores vs M1 Max
GPU: 30- or 38-core
Memory: LPDDR5-6400, 512-bit → ~400 GB/s (same as M1 Max)
RAM options: 32 / 64 / 96 GB (96 GB was new)

Product	Released	Chip options	RAM	Base price
MacBook Pro 14” (2023)	Jan 2023	M2 Pro 10C/16C or 12C/19C; M2 Max 30C/38C-GPU	16 / 32 / 64 / 96 GB	$1,999
MacBook Pro 16” (2023)	Jan 2023	M2 Pro 12C/19C; M2 Max 30C/38C-GPU	16 / 32 / 64 / 96 GB	$2,499
Mac mini M2 Pro	Jan 2023	10C/16C or 12C/19C	16 / 32 GB	$1,299
Mac Studio M2 Max	Jun 2023	M2 Max 30C or 38C-GPU	32 / 64 / 96 GB	$1,999

M2 Ultra — TSMC N5P, two M2 Max via UltraFusion

CPU: 16P+8E (24C)
GPU: 60- or 76-core
Neural Engine: 32 cores
Memory: LPDDR5-6400, 1024-bit → ~800 GB/s
RAM options: 64 / 128 / 192 GB (192 GB was the headline)

Product	Released	Chip options	RAM	Base price
Mac Studio M2 Ultra	Jun 2023	60C or 76C-GPU	64 / 128 / 192 GB	$3,999
Mac Pro M2 Ultra	Jun 2023	60C or 76C-GPU	64 / 128 / 192 GB	$6,999

Mac Pro M2 Ultra is essentially a Mac Studio in a tower. Same chip, same RAM cap, same bandwidth. The only differentiator is PCIe expansion, and PCIe-attached GPUs cannot share unified memory — they have no MLX path and limited llama.cpp Metal path. For LLM work the Mac Pro M2 Ultra is a strict downgrade in value vs. the Mac Studio M2 Ultra.

LLM viability of the M2 family:

M2 base 24 GB: 13B Q4 just barely; 7B Q4 comfortably at ~22 tok/s.
M2 Pro 32 GB: 13B Q4 at ~30 tok/s; 30B Q4 squeezed.
M2 Max 96 GB: 70B Q4 at ~10–13 tok/s; Mixtral 8x7B Q4 comfortably.
M2 Ultra 192 GB: 70B FP16 (140 GB) at ~5–7 tok/s; Mixtral 8x22B Q4 (~80 GB) comfortably.

M3 family (Oct 2023 — Oct 2024; M3 Ultra arrived March 2025)

M3 — TSMC N3B (first-gen 3 nm)

CPU: 4P+4E (8C)
GPU: 8- or 10-core — first Apple GPU with hardware ray tracing and mesh shading
Memory: LPDDR5-6400, 128-bit → ~100 GB/s (unchanged from M2)
RAM options: 8 / 16 / 24 GB

Product	Released	RAM	Base price
MacBook Pro 14” M3	Nov 2023	8 / 16 / 24 GB	$1,599
iMac 24” M3	Nov 2023	8 / 16 / 24 GB	$1,299
MacBook Air 13”/15” M3	Mar 2024	8 / 16 / 24 GB	$1,099 / $1,299

M3 Pro — TSMC N3B (the controversial one)

CPU: 5P+6E (11C) or 6P+6E (12C) — Apple shifted toward more efficiency cores
GPU: 14- or 18-core
Memory: LPDDR5-6400, 192-bit → ~150 GB/s — DOWN from M2 Pro’s 200 GB/s
RAM options: 18 / 36 GB — unusual numbers reflecting the narrower 192-bit bus

The M3 Pro is a regression for LLM users. A narrower memory bus (192-bit vs M2 Pro’s 256-bit) plus same LPDDR5 clock means ~25% less bandwidth. Documented extensively by Chips and Cheese (“Apple’s M3 Pro: A Step Sideways”, Nov 2023), Vadim Yuryev (MaxTech), and r/LocalLLaMA community measurements. The Apple-side rationale (per supply-chain reporting) was N3B yield economics — cutting bus width saves die area. M4 Pro restored and then improved the bus.

M3 Max — TSMC N3B (two distinct bandwidth tiers under one name)

The M3 Max shipped in two memory configurations, depending on which CPU bin you got:

M3 Max variant	CPU	GPU	RAM options	Bandwidth	Bus
Binned (14-core)	10P+4E	30-core	36 / 96 GB	~300 GB/s	384-bit
Full (16-core)	12P+4E	40-core	48 / 64 / 128 GB	~400 GB/s	512-bit

Same chip name, different memory subsystems. Buyers who ordered 36 GB automatically got the binned (slower) variant; 64 GB or 128 GB orders got the full variant.

Product	Released	Chip options	RAM	Base price
MacBook Pro 14” M3 Pro/Max	Nov 2023	Pro 11C/14C-GPU or 12C/18C-GPU; Max 14C/30C-GPU or 16C/40C-GPU	18 / 36 / 48 / 64 / 96 / 128 GB	$1,999
MacBook Pro 16” M3 Pro/Max	Nov 2023	Same	18 / 36 / 48 / 64 / 96 / 128 GB	$2,499

M3 Ultra — TSMC N3B, two M3 Max via UltraFusion (March 2025)

Apple skipped an M3 Ultra in the original M3 lineup and released it only in March 2025 — after M4 had already shipped in iPads and Macs. The headline was the 512 GB unified memory option, exclusive to M3 Ultra; M4 has no Ultra tier as of this writing.

CPU: 24P+8E (32C) — two full M3 Max dies
GPU: 60- or 80-core
Neural Engine: 32 cores
Memory: LPDDR5-6400, 1024-bit → ~800 GB/s
RAM options: 96 / 256 / 512 GB

Product	Released	Chip options	RAM	Base price
Mac Studio M3 Ultra	Mar 2025	60C-GPU (96/256 GB) or 80C-GPU (96/256/512 GB)	96 / 256 / 512 GB	$3,999 / $5,499 (80C base) / ~$9,500+ (512 GB)

The 512 GB Mac Studio M3 Ultra is the highest-RAM consumer machine ever sold by any vendor. No PC platform, including AMD Strix Halo (128 GB cap) or any single-GPU rig (RTX 5090 caps at 32 GB GDDR7), comes close at consumer pricing. Reportedly runs Llama 3.1 405B Q4 (~250 GB) at ~2 tok/s and DeepSeek-V3/R1 Q4 (~340 GB) at usable interactive speeds for a single user.

LLM viability of the M3 family:

M3 base 24 GB: 7B Q4 at ~20–25 tok/s; 13B Q4 tight.
M3 Pro 36 GB: regressed vs M2 Pro on bandwidth-bound generation; ~20–25 tok/s on 13B Q4, where M2 Pro hits 25–30.
M3 Max 14C/36 GB: ~50–60 tok/s on 7B Q4; 30B Q4 fits with room.
M3 Max 16C/128 GB: 70B Q4 at ~10–14 tok/s.
M3 Ultra 256 GB: 70B FP16 at ~7–9 tok/s; Mixtral 8x22B Q4 with massive headroom.
M3 Ultra 512 GB: only consumer device that runs Llama 3.1 405B Q4 or DeepSeek-V3 Q4 in unified memory.

M4 family (May 2024 iPad Pro; Oct 2024 Macs)

The M4 broke from N3B and uses TSMC N3E — a more mature 3 nm variant. Memory moved to LPDDR5X (8533 MT/s vs M1–M3’s LPDDR5 at 6400), which is the headline bandwidth driver.

M4 — TSMC N3E

CPU: 4P+6E (10C) — added 2 E-cores
GPU: 8- or 10-core (Dynamic Caching + hardware RT/mesh shading carried over from M3)
Neural Engine: 16 cores (uplifted throughput, Apple quotes ~38 TOPS)
Memory: LPDDR5X, 128-bit → ~120 GB/s
RAM options: 16 / 24 / 32 GB — 8 GB base finally retired

Product	Released	RAM	Base price
iPad Pro M4	May 2024	8 / 16 GB	—
MacBook Pro 14” M4	Oct 2024	16 / 24 / 32 GB	$1,599
iMac 24” M4	Oct 2024	16 / 24 / 32 GB	$1,299
Mac mini M4	Oct 2024	16 / 24 / 32 GB	$599
MacBook Air 13”/15” M4	Mar 2025	16 / 24 / 32 GB	$999 / $1,199

M4 Pro — TSMC N3E (memory bandwidth restored, then some)

After the M3 Pro controversy, Apple widened the bus and moved to LPDDR5X:

CPU: 8P+4E (12C) or 10P+4E (14C)
GPU: 16- or 20-core
Memory: LPDDR5X-8533, 256-bit → ~273 GB/s (up from M3 Pro’s 150 GB/s, also up from M2 Pro’s 200 GB/s)
RAM options: 24 / 48 / 64 GB

M4 Max — TSMC N3E (two bandwidth tiers again)

M4 Max variant	CPU	GPU	RAM options	Bandwidth	Bus
Binned (14-core)	10P+4E	32-core	36 GB only	~410 GB/s	384-bit
Full (16-core)	12P+4E	40-core	48 / 64 / 128 GB	~546 GB/s	512-bit

The full M4 Max at 546 GB/s is the biggest generational bandwidth jump in M-series history — ~37% over M3 Max full.

Product	Released	Chip options	RAM	Base price
MacBook Pro 14” M4 Pro/Max	Oct 2024	Pro 12C/16C-GPU or 14C/20C; Max 14C/32C-GPU or 16C/40C	24 / 36 / 48 / 64 / 128 GB	$1,999
MacBook Pro 16” M4 Pro/Max	Oct 2024	Same	24 / 48 / 64 / 128 GB	$2,499
Mac mini M4 Pro	Oct 2024	Pro 12C/16C or 14C/20C	24 / 48 / 64 GB	$1,399
Mac Studio M4 Max	Mar 2025	Max 14C/32C-GPU or 16C/40C	36 / 48 / 64 / 128 GB	$1,999

M4 Ultra — does not ship as of early 2026

When Apple refreshed Mac Studio in March 2025, they paired the M4 Max with the M3 Ultra — a deliberate split-generation product. Per Bloomberg’s Mark Gurman and multiple supply-chain leaks, the M4 Max die does not include the UltraFusion interconnect needed to fuse two dies. Apple appears to have decided early in M4’s design that the Ultra tier would be skipped; the 512 GB slot is held by M3 Ultra until M5 Ultra (if any) arrives.

Note: the existing modern-chip-landscape.md refers to an “M4 Ultra in Mac Studio” — that’s incorrect. The Mac Studio Ultra ships M3 Ultra, not M4 Ultra. Worth correcting next time that note is revised.

LLM viability of the M4 family:

M4 base 32 GB: 13B Q4 comfortably; 7B Q4 at ~25–30 tok/s.
M4 Pro 64 GB: 30B Q4 comfortably; 70B Q4 squeezed; ~35–40 tok/s on 13B Q4.
M4 Max 128 GB: 70B Q4 at ~14–18 tok/s (best mobile-Mac numbers).

M5 family (announced/launched late 2025 — early 2026)

Confidence: medium for shipped parts, low for unannounced.

M5 — TSMC N3P

The marquee M5 architectural change: neural accelerators inside each GPU core — Apple’s name for matrix-multiply hardware embedded in the GPU itself, conceptually similar to NVIDIA Tensor Cores. This means MLX and llama.cpp Metal-backend inference (already GPU-routed) gets a substantial uplift on prompt-processing without the Neural Engine’s operator constraints.

CPU: 4P+6E (refined cores)
GPU: new-architecture with embedded neural accelerators
Memory: LPDDR5X-9600, 128-bit → ~150 GB/s
RAM options: 16 / 24 / 32 GB

Product	Released	RAM
MacBook Pro 14” M5	Oct 2025	16 / 24 / 32 GB
iPad Pro M5	Oct 2025	12 / 16 GB
MacBook Air M5	expected Spring 2026	16 / 24 / 32 GB

M5 Pro / Max / Ultra — not confirmed at time of writing

Rumored for spring/summer 2026 with continued bandwidth uplift. Treat any specific numbers as speculation. If/when an M5 Ultra arrives with LPDDR5X-9600 on the same 1024-bit bus, that’s an automatic ~50% bandwidth jump from M3 Ultra to ~1.2 TB/s — the first Ultra-tier bandwidth movement in four generations.

Early M5 LLM impact: community benchmarks suggest ~2–4× prompt-processing throughput vs M4 at the same memory bandwidth (the GPU-embedded matmul units carry the load). Decode (bandwidth-bound) sees a more modest ~25% uplift matching the bandwidth gain.

Notable surprises and learnings

1. The M3 Pro bandwidth regression was a real product mistake. 150 GB/s vs M2 Pro’s 200 GB/s. The unusual 18 / 36 GB capacities exist because of the 192-bit bus (six DRAM channels vs four wider ones). M4 Pro’s 273 GB/s on 256-bit LPDDR5X is the correction. If a user has an M3 Pro, expect them to be measurably slower than the equivalent M2 Pro on bandwidth-bound LLM decode — surprising but real.

2. Both M3 Max and M4 Max ship in two bandwidth tiers under one name. Buying 36 GB on a Max-tier Mac gets you the binned die (384-bit bus, lower bandwidth). Buying 48 GB or above gets the full die (512-bit). The bandwidth delta — 300 vs 400 on M3, 410 vs 546 on M4 — is material for LLM inference. Locara device cards should distinguish these explicitly.

3. Ultra-tier bandwidth has been flat at ~800 GB/s for four generations. M1, M2, M3 Ultra all sit at 800 GB/s on 1024-bit LPDDR5-6400. Apple has not moved Ultra-tier to LPDDR5X. The bandwidth scales only via die fusion, not memory speed. An M5 Ultra with LPDDR5X-9600 on the same bus would be the first real Ultra-tier bandwidth jump.

4. M3 Ultra 512 GB is the consumer-hardware capacity ceiling. No other shipping consumer machine — AMD Strix Halo (128 GB cap), any single-GPU rig (RTX 5090 caps at 32 GB GDDR7) — comes close at consumer pricing. The next viable option above 512 GB is a workstation with multiple GPUs or a server platform, both 5–20× the cost. This is genuinely a one-of-a-kind product as of early 2026.

5. The Intel Mac Pro 2019 had more RAM than any Apple Silicon Mac until March 2025. Intel Mac Pro: 1.5 TB DDR4 ECC, ~140 GB/s. M3 Ultra: 512 GB LPDDR5, ~800 GB/s. Apple Silicon traded raw capacity for ~6× bandwidth and unified addressing — the structurally correct call for what matters in inference.

6. Apple’s published bandwidth is peak theoretical. Real LLM inference typically achieves 70–85% of peak. Vadim Yuryev (MaxTech) and the now-defunct AnandTech both consistently noted this in their reviews. Use the published numbers as upper bounds.

7. The Neural Engine has been remarkably static through M4. 16 cores from M1 through M4 (32 on Ultras). Per-core throughput has improved (~11 → ~38 TOPS quoted) but local-LLM stacks (MLX, llama.cpp Metal) bypass the Neural Engine entirely and target the GPU, because the Neural Engine has tight operator-support constraints. M5’s GPU-embedded neural accelerators are the first real architectural change that local LLM stacks can use directly.

8. There is no Apple Silicon Mac with user-upgradeable RAM. Every M-series Mac has soldered LPDDR. The Intel-era Mac mini 2018, iMac 27” 2020, and Mac Pro 2019 are the last user-upgradeable-RAM Macs. Users who under-buy RAM at purchase time are stuck for the life of the machine — this is the single most common Locara-relevant deployment mistake. App manifests need to fail loud and early when the user’s machine is undersized.

9. The Mac Pro M2 Ultra is effectively obsolete for LLM work. Same chip, same RAM cap (192 GB), same bandwidth as the Mac Studio M2 Ultra; PCIe expansion can’t host LLM-relevant GPUs (no MLX, weak Metal path). A Mac Studio M2 Ultra at $3,999 strictly dominates a Mac Pro M2 Ultra at $6,999 for local AI.

10. iPad Pro M4 has more memory bandwidth than any Intel Mac. ~120 GB/s vs Intel Mac Pro’s ~140 GB/s on paper, but with unified memory and MLX support — iPad Pro M4 with 16 GB runs 7–13B Q4 models faster than any pre-2020 Intel Mac, in a tablet. Same chip as MacBook Air M4 16 GB. The mobile-class compute ceiling has moved.

Specific learnings for Locara

Manifest device-class targeting needs both RAM and bandwidth. A 64 GB M2 Max and a 64 GB M3 Max-14C have very different LLM performance even though they have the same RAM tier. The manifest schema should accept either a coarse tier (“Tier 3”) or a specific (RAM, bandwidth) pair, with the runtime computing the user’s actual numbers via sysctl hw.memsize and a chip-keyed bandwidth lookup table.
Distinguish M3 Max 14C from M3 Max 16C, and M4 Max 14C from M4 Max 16C. Same chip name, different bandwidth. Locara’s device detection should read sysctl machdep.cpu.brand_string and sysctl hw.perflevel0.physicalcpu to disambiguate, then cross-reference with this note’s table.
The M3 Pro regression is a real datapoint for the manifest. A 36 GB M3 Pro is slower on bandwidth-bound LLM decode than a 32 GB M2 Pro despite more RAM. App authors should be steered toward setting a bandwidth floor, not just a RAM floor, if their app does long-form generation.
Soldered RAM means hardware tier is a permanent property of the user, not a runtime variable. This is unlike a PC where the user can add RAM. Locara should remember the user’s machine tier as a persistent profile attribute and warn at install time, not at runtime, when an app exceeds that tier.
The Mac Studio Ultra is the LLM-first hardware product Apple ships. Locara’s “what’s the most ambitious app you can ship?” question is bounded by what Mac Studio M3 Ultra 512 GB can run — DeepSeek-V3-Q4 (~340 GB) is the current ceiling. Apps targeting “the most demanding user” should be built knowing this is the platform.
Mac mini M4 is the right value-tier deployment target. $599 base, 16 GB RAM, ~120 GB/s, runs 7B Q4 well. Locara’s reference apps and onboarding flow should be tuned to “the Mac mini M4 user,” because that’s the price/capability sweet spot for new local-AI adopters.
MacBook Air remains the volume-weighted target. Anyone buying a Mac for casual use buys an Air. M1 16 GB, M2 16/24 GB, M3 16/24 GB, M4 16/24/32 GB Airs collectively dominate the install base. Apps targeting “the median Locara user” must work well at 16 GB / ~100–120 GB/s — that’s the binding constraint for v1 reference apps.
Don’t trust the “GB/s” Apple publishes as a real performance number. Multiply by ~0.75 utilization for honest tok/s estimates. The full formula is in llm-memory-math.md.
Treat Intel Macs as out-of-scope for v1. Even the iMac Pro and Mac Pro 2019 are 1/3 to 1/5 the speed of an entry M-series Mac on LLM inference, with no MLX path. Locara v1 should refuse to install on Intel Macs with a clear “your Mac doesn’t have the unified-memory architecture this app requires” message.
A bandwidth-keyed model picker is the right manifest primitive. Given the user’s measured bandwidth, the runtime can publish “Llama 3 8B Q4 at expected ~X tok/s on your Mac” as the install-time gate. Honesty about expected performance is the LSB of trust.

References

Apple primary sources (tech specs and announcements):

Apple tech specs archive (every model): https://support.apple.com/specs/
Mac Studio: https://www.apple.com/mac-studio/specs/
MacBook Pro: https://www.apple.com/macbook-pro/specs/
MacBook Air: https://www.apple.com/macbook-air/specs/
Mac mini: https://www.apple.com/mac-mini/specs/
iMac: https://www.apple.com/imac/specs/
Apple Newsroom (launch press releases with confirmed dates and prices): https://www.apple.com/newsroom/
M3 Ultra announcement (Mar 2025): https://www.apple.com/newsroom/2025/03/apple-reveals-m3-ultra-taking-apple-silicon-to-a-new-extreme/

Microarchitecture deep dives:

AnandTech (Andrei Frumusanu, Ryan Smith) — M1 / M1 Pro / M1 Max die analyses (Oct 2021) and A14/A15/A16 Firestorm-Avalanche-Everest core work. Site ceased active publication August 2024; archives still at anandtech.com.
Chips and Cheese (https://chipsandcheese.com) — “Apple’s M3 Pro: A Step Sideways” (Nov 2023) documented the M3 Pro bandwidth regression with measured numbers. Multiple M2 Max / M4 Max deep dives.
SemiAnalysis (Dylan Patel) — TSMC N3B vs N3E yield economics, Apple’s process-node transition timing.
TechInsights — die-shot analyses for each M-series generation.
Hot Chips conference papers — Apple has presented some M-series details (e.g., M1 at Hot Chips 33).

Reviews and measured performance:

The Verge (Nilay Patel, Monica Chin) — reviews for every major Mac launch since 2015.
Ars Technica (Andrew Cunningham, Samuel Axon) — particularly strong on Mac mini and Mac Studio with real-workload focus.
Notebookcheck (https://notebookcheck.net) — benchmark database for every Mac with comparable scores.
MaxTech / Vadim Yuryev (YouTube) — consistent Geekbench Memory and llama.cpp benchmarks on every new Mac.
MKBHD / Marques Brownlee (YouTube) — spec breakdowns and side-by-side reviews.
AlexZiskind (YouTube) — the most rigorous Mac LLM benchmarking, with thermal and sustained-vs-burst breakdowns.

LLM-specific Mac benchmarks:

r/LocalLLaMA (https://reddit.com/r/LocalLLaMA) — single best aggregator for community-measured tok/s per Mac SKU. Search for specific chip names and quants.
llama.cpp issue tracker — Apple Silicon performance discussions and bandwidth-vs-tok/s charts: https://github.com/ggerganov/llama.cpp/issues
MLX repo + issues — Apple’s official LLM stack: https://github.com/ml-explore/mlx
Awni Hannun (MLX lead) — Twitter @awnihannun, blog posts on MLX benchmarks.
Simon Willison (simonwillison.net) — practical Mac LLM write-ups with measurements on M2 Max and M3 Ultra.

WWDC sessions:

WWDC 2020 “Explore the new system architecture of Apple silicon Macs” — first official UMA description.
WWDC 2023 / 2024 / 2025 Metal and ML sessions — MLX, Neural Engine, M5 GPU neural accelerators.

Source caveats:

“M4 Ultra exists” — does not as of early 2026. The Mac Studio (Mar 2025) pairs M4 Max with M3 Ultra. Earlier modern-chip-landscape.md reference to M4 Ultra in Mac Studio is incorrect.
M5 Pro / Max / Ultra specs — speculative; not shipping as of writing.
LPDDR5X clock for M4 — sources differ on whether base M4 uses 7500 or 8533 MT/s; Apple’s published 120 GB/s suggests a lower clock on base than on Pro/Max tiers.
Mac Pro M3 Ultra — has not shipped; Mac Pro remains on M2 Ultra as of writing.