Modern Chip Landscape (early 2026)
What this is: A snapshot of the chips currently shipping in mobile, laptop, desktop, and data-center systems as of early 2026. Not exhaustive — focused on chips relevant to running LLMs locally and chips that define the performance norms a Locara app has to be tuned against.
Why it matters: Locara’s targets are the user’s actual hardware. We can’t reason about “what runs on a Mac” or “what fits on a Pixel” without knowing the current product matrix and where it’s heading.
Most relevant to Locara: Pairs with chip-fundamentals.md for the underlying mental model. This note focuses on the specific products Locara apps will ship to.
Caveat: this snapshot ages fast. Re-check before making device-targeting decisions; nodes, memory configs, and product lines shift every 12–18 months. Numbers below are best as of writing; treat them as approximate.
Quick reference: device classes for local LLMs
| Tier | Examples | Memory / VRAM | Realistic local model |
|---|---|---|---|
| 0 | Phone (8 GB) | 8 GB | 1–3B Q4 |
| 1 | Phone Pro / iPad mini, base laptop (16 GB) | 12–16 GB | 7B Q4 |
| 2 | Pro laptop (32 GB) / RTX 4060–4070 | 32 GB / 12 GB | 13B Q4 / Mixtral 8x7B Q4 |
| 3 | Mac M-Pro/Max (64 GB) / RTX 4080–4090 / 5080 | 64 GB / 16–24 GB | 30–70B Q4 |
| 4 | Mac Studio (128–256 GB) / RTX 5090 / Strix Halo | 128+ GB / 32 GB | 70B FP16 / 405B Q4 |
| 5 | Mac Studio Ultra (512 GB) / multi-GPU rigs | 256–512+ GB | Frontier-scale local |
The bandwidth and capacity numbers below explain why each tier sits where it does.
Mobile (phones / tablets)
Apple A-series and M-in-iPad
- A18 / A18 Pro (iPhone 16 family, Sept 2024) — TSMC N3E. 6-core CPU, 5- or 6-core GPU, 16-core Neural Engine. 8 GB RAM standard. The first iPhone designed around on-device generative AI (“Apple Intelligence” — local ~3B class plus Private Cloud Compute fallback).
- A19 / A19 Pro (iPhone 17 family, Sept 2025) — TSMC N3P. Memory bumped on Pro models, expanded NPU. Credibly runs ~7B Q4 models on Pro tiers.
- M4 in iPad Pro (May 2024) — TSMC N3E. Up to 16 GB unified memory. The highest-capability tablet for local LLMs by a wide margin; effectively a Mac in iPad form factor.
Qualcomm Snapdragon (Android flagship)
- Snapdragon 8 Gen 3 (late 2023) — last on Cortex-X4 cores.
- Snapdragon 8 Elite (late 2024) — first mobile chip with Qualcomm’s in-house Oryon cores (acquired with Nuvia, 2021). Major IPC jump; closes much of the gap with Apple. TSMC N3E.
- Snapdragon 8 Elite Gen 2 / variants (late 2025) — second-gen Oryon mobile.
- Hexagon NPU in all generations. Qualcomm pushed “AI PC” framing hard but third-party LLM tooling for Hexagon remains weak vs. Apple’s MLX or NVIDIA’s CUDA.
MediaTek Dimensity / Google Tensor / Samsung Exynos
- MediaTek Dimensity 9400 / 9500 (late 2024 / late 2025) — ARM reference cores, TSMC N3. Mid-range Android flagship usage.
- Google Tensor G4 (Pixel 9, Aug 2024) — last Samsung-fabbed.
- Google Tensor G5 (Pixel 10, Aug 2025) — first fully Google-designed Tensor, fabbed at TSMC. Optimized for Gemini Nano and on-device successor models.
- Samsung Exynos 2400 / 2500 — used in some Galaxy variants regionally.
Mobile takeaway
Mobile compute jumped a generation in 2024 with M4 in iPad and Snapdragon Elite — both deliver real on-device LLM headroom (3–7B Q4 plausible at usable speeds). NPUs got first-class CPU/GPU peer treatment in software stacks (Apple Intelligence, Gemini Nano). Memory is the binding constraint: 8 GB phones run 1–3B models comfortably; 12 GB Pro models reach 7B Q4; 16 GB iPad Pro reaches 13B Q4.
Laptop
Apple Mac (M-series)
- M3 / M3 Pro / M3 Max (Oct 2023) — TSMC N3B. M3 Max up to 128 GB unified memory, ~400 GB/s bandwidth. Extremely strong local-LLM target.
- M4 / M4 Pro / M4 Max (Oct 2024) — TSMC N3E. Base M4 ~120 GB/s, M4 Pro ~273 GB/s, M4 Max ~546 GB/s. Up to 128 GB on M4 Max.
- M4 Ultra (Mac Studio, March 2025) — two M4 Max dies via UltraFusion. Up to 512 GB unified memory, ~1 TB/s bandwidth. The single most LLM-friendly consumer compute platform that exists.
- M5 family (rumored late 2025–2026) — N3P or N2 expected.
The Apple Silicon line is the single most LLM-friendly consumer compute lineup, by structural advantage of unified memory. A maxed Mac Studio runs 405B-class models at usable speeds — no consumer x86 config approaches it without multi-GPU server hardware.
Intel Core Ultra
- Core Ultra Series 1 (“Meteor Lake,” Dec 2023) — first Intel chip with on-die NPU.
- Core Ultra Series 2: “Lunar Lake” (Sep 2024 — thin-and-light, on-package memory) and “Arrow Lake” (Oct 2024 — desktop and mobile-H).
- Panther Lake (late 2025–2026) — Intel 18A process, the test of Intel’s foundry comeback.
Intel’s NPU is third-best of the four major NPUs (vs. Apple, Qualcomm, AMD) on raw throughput. Real-world LLM perf is competitive only if a discrete GPU is added.
AMD
- Ryzen AI 300 (“Strix Point,” July 2024) — Zen 5 + RDNA 3.5 iGPU + XDNA 2 NPU.
- Ryzen AI Max+ “Strix Halo” (CES 2025, early 2025) — up to 128 GB unified-style LPDDR5X memory with a big iGPU. The closest x86 has come to Apple Silicon’s unified-memory model; targets workstation-class AI workloads. Effectively the only PC laptop platform that approaches Mac on local 70B-class LLM viability.
Qualcomm Snapdragon X
- Snapdragon X Elite / X Plus (mid-2024) — Oryon cores on a Windows laptop. Marketed as “AI PC.” Strong CPU perf-per-watt, weak GPU. Software ecosystem (Windows on ARM, ONNX-on-Hexagon) still maturing.
- Snapdragon X2 (late 2025) — second-gen Oryon for laptop.
Laptop takeaway for local LLMs
- Mac M-series wins across price points — unified memory + bandwidth + the Metal/MLX/llama.cpp stack maturity.
- AMD Strix Halo is the first real x86 contender for high-RAM local LLM laptops.
- Discrete-GPU PCs (mobile RTX 4080/4090) are competitive at ≤24 GB models but capacity-bound at larger scales.
- Qualcomm Snapdragon X is interesting but Windows-on-ARM software lag is the bottleneck.
- Intel is competitive only with a discrete GPU.
Desktop
CPUs
- AMD Ryzen 9000 (Zen 5, July 2024). Ryzen 9 9800X3D (Nov 2024) — gaming/L3 leader.
- Intel Core Ultra 200 desktop (Arrow Lake, Oct 2024) — disappointed against AMD on gaming/perf-per-watt.
- Apple Mac Studio with M4 Max / M4 Ultra is the desktop endpoint of the Mac line.
GPUs (the LLM workhorse on PC)
- NVIDIA RTX 50 series (Blackwell, Jan 2025): RTX 5090 (32 GB GDDR7, ~1.79 TB/s), 5080 (16 GB), 5070 Ti (16 GB), 5070 (12 GB), 5060 Ti (16/8 GB), 5060 (8 GB).
- NVIDIA RTX 40 series (Ada Lovelace, late 2022): RTX 4090 24 GB GDDR6X — still a workhorse for many local-LLM rigs.
- AMD Radeon RX 9000 (RDNA 4, March 2025): RX 9070 XT 16 GB. Improved ROCm support but still trailing CUDA in LLM stacks.
- Intel Arc Battlemage (late 2024): Arc B580 12 GB. Budget-tier; decent value.
For local LLMs on desktop, the binding number is GPU VRAM:
- 8 GB → 7B Q4 only
- 12 GB → 13B Q4
- 16 GB → 13B FP16 / 30B Q4
- 24 GB → 70B Q4 (tight)
- 32 GB (RTX 5090) → 70B Q4 comfortably / Mixtral 8x22B Q4
- 2× 24 GB (dual 4090) → 70B FP16 with offload
Desktop takeaway
If the goal is maximum LLM throughput per dollar, a desktop with one or two high-VRAM consumer NVIDIA GPUs still leads. If the goal is maximum capacity at home (running 100B+ models), Mac Studio with 192/256 GB unified memory is the only realistic single-box answer at consumer prices.
Server / data center
(Mostly informational for Locara — local apps don’t run here. But these chips set the frontier and trickle down.)
NVIDIA
- Hopper (H100 / H200) — H100 (2022), H200 (2024) with HBM3e (141 GB).
- Blackwell (B100 / B200 / GB200) — 2024–2025. GB200 NVL72 the rack-scale unit; HBM3e 192 GB per GPU on B200. The reigning frontier-training platform.
- Rubin — successor, 2026.
AMD
- MI300X / MI325X / MI355X (Instinct) — 192 GB and now 256 GB HBM. Credible #2 to NVIDIA. ROCm software still trailing CUDA but improving fast.
- MI400 — announced for late 2026.
Google TPU
- TPU v5p (2024), TPU v6 “Trillium” (2024), TPU v7 “Ironwood” (announced 2025). Internal-mostly, available via GCP.
Other
- Cerebras WSE-3 (2024) — wafer-scale (~85K cores, no off-chip DRAM). Inference-focused.
- Groq LPU — single-stream low-latency inference.
- Tenstorrent Wormhole / Blackhole (Jim Keller’s company) — RISC-V-based AI cores; open software.
- AWS Trainium / Inferentia, Microsoft Maia, Meta MTIA — hyperscaler in-house silicon, not for sale.
- Etched Sohu (2024–2025) — transformer-fixed-graph ASIC; bet that the architecture is stable enough to bake in.
Specific learnings for Locara
- Macs are the privileged target for v1. Unified memory + the Metal/MLX/llama.cpp stack means the same Locara app can run 7B-class models on a $599 Mac mini and 70B-class on a Mac Studio without changing code paths. Cross-platform support comes after.
- Document a clear “device class” matrix in the manifest schema (see the table at the top of this note). Apps declare minimum tier; runtime computes user’s actual tier and matches. Fail loud at install time if mismatched.
- Discrete-GPU PCs are an important secondary target. RTX 4090 / 5090 owners are local-LLM enthusiasts and prosumer Locara customers. Support CUDA via llama.cpp / vLLM-equivalent backends, but recognize that consumer VRAM ceiling caps capability vs. Mac Studio.
- Strix Halo opens the x86 wedge. AMD Strix Halo’s 128 GB unified memory + strong iGPU is the first credible non-Apple platform for large-model local AI. Worth tracking; Locara’s cross-platform path likely runs through it before generic Windows-PC support.
- Mobile is real but constrained. A18 Pro / A19 Pro / Snapdragon Elite / Tensor G5 can run 3–7B Q4 models. iPad Pro (M4) is the highest-capability handheld for LLMs. Don’t dismiss mobile — design app capability tiers so the same Locara app gracefully scales down.
- NPUs remain a fragmented frontier. Apple Neural Engine, Hexagon, XDNA, Intel NPU all exist; tooling is incompatible. Don’t make NPU support a v1 requirement. GPU/CPU paths with NPU as opportunistic acceleration is the durable abstraction.
- Server-class compute trickles down ~3–4 years. What ran on 8× H100s in 2023 (175B at FP16) runs on a Mac Studio in 2025 at Q4. The local-AI bet is fundamentally that this trickle continues. As of 2026 the trajectory is solid; the leaders (NVIDIA / AMD / Google / Cerebras) are all shipping new generations on schedule.
- Memory bandwidth, in GB/s, is the single most predictive number for “how fast does my model run on this device.” Publish it on every device card alongside RAM size and core count.
References
- Chips and Cheese (https://chipsandcheese.com) — deep CPU microarchitecture analysis, especially Apple/Qualcomm/AMD core comparisons.
- SemiAnalysis (Dylan Patel) — definitive product/process tracking and economics.
- TechPowerUp GPU Database (https://techpowerup.com/gpu-specs/) — exhaustive GPU specs.
- Apple WWDC sessions on Metal / MLX / Apple Intelligence (annual, free on developer.apple.com).
- NVIDIA GTC keynotes (Jensen Huang’s annual roadmap reveals).
- AMD CES / Computex announcements (Lisa Su’s keynotes).
- Geekerwan YouTube channel — deep mobile SoC analysis with serious benchmarking discipline.
- Notebookcheck (https://notebookcheck.net) — laptop CPU/GPU benchmarks at scale.
- r/LocalLLaMA — community-driven hardware testing for local LLM workloads. The single most useful aggregator of “what tok/s does X get on Y.”
- Hardware Unboxed / Gamers Nexus (YouTube) — desktop GPU and CPU testing.