Modern Chip Landscape (early 2026)

What this is: A snapshot of the chips currently shipping in mobile, laptop, desktop, and data-center systems as of early 2026. Not exhaustive — focused on chips relevant to running LLMs locally and chips that define the performance norms a Locara app has to be tuned against. Why it matters: Locara’s targets are the user’s actual hardware. We can’t reason about “what runs on a Mac” or “what fits on a Pixel” without knowing the current product matrix and where it’s heading. Most relevant to Locara: Pairs with chip-fundamentals.md for the underlying mental model. This note focuses on the specific products Locara apps will ship to.

Caveat: this snapshot ages fast. Re-check before making device-targeting decisions; nodes, memory configs, and product lines shift every 12–18 months. Numbers below are best as of writing; treat them as approximate.

Quick reference: device classes for local LLMs

Tier	Examples	Memory / VRAM	Realistic local model
0	Phone (8 GB)	8 GB	1–3B Q4
1	Phone Pro / iPad mini, base laptop (16 GB)	12–16 GB	7B Q4
2	Pro laptop (32 GB) / RTX 4060–4070	32 GB / 12 GB	13B Q4 / Mixtral 8x7B Q4
3	Mac M-Pro/Max (64 GB) / RTX 4080–4090 / 5080	64 GB / 16–24 GB	30–70B Q4
4	Mac Studio (128–256 GB) / RTX 5090 / Strix Halo	128+ GB / 32 GB	70B FP16 / 405B Q4
5	Mac Studio Ultra (512 GB) / multi-GPU rigs	256–512+ GB	Frontier-scale local

The bandwidth and capacity numbers below explain why each tier sits where it does.

Mobile (phones / tablets)

Apple A-series and M-in-iPad

A18 / A18 Pro (iPhone 16 family, Sept 2024) — TSMC N3E. 6-core CPU, 5- or 6-core GPU, 16-core Neural Engine. 8 GB RAM standard. The first iPhone designed around on-device generative AI (“Apple Intelligence” — local ~3B class plus Private Cloud Compute fallback).
A19 / A19 Pro (iPhone 17 family, Sept 2025) — TSMC N3P. Memory bumped on Pro models, expanded NPU. Credibly runs ~7B Q4 models on Pro tiers.
M4 in iPad Pro (May 2024) — TSMC N3E. Up to 16 GB unified memory. The highest-capability tablet for local LLMs by a wide margin; effectively a Mac in iPad form factor.

Qualcomm Snapdragon (Android flagship)

Snapdragon 8 Gen 3 (late 2023) — last on Cortex-X4 cores.
Snapdragon 8 Elite (late 2024) — first mobile chip with Qualcomm’s in-house Oryon cores (acquired with Nuvia, 2021). Major IPC jump; closes much of the gap with Apple. TSMC N3E.
Snapdragon 8 Elite Gen 2 / variants (late 2025) — second-gen Oryon mobile.
Hexagon NPU in all generations. Qualcomm pushed “AI PC” framing hard but third-party LLM tooling for Hexagon remains weak vs. Apple’s MLX or NVIDIA’s CUDA.

MediaTek Dimensity / Google Tensor / Samsung Exynos

MediaTek Dimensity 9400 / 9500 (late 2024 / late 2025) — ARM reference cores, TSMC N3. Mid-range Android flagship usage.
Google Tensor G4 (Pixel 9, Aug 2024) — last Samsung-fabbed.
Google Tensor G5 (Pixel 10, Aug 2025) — first fully Google-designed Tensor, fabbed at TSMC. Optimized for Gemini Nano and on-device successor models.
Samsung Exynos 2400 / 2500 — used in some Galaxy variants regionally.

Mobile takeaway

Mobile compute jumped a generation in 2024 with M4 in iPad and Snapdragon Elite — both deliver real on-device LLM headroom (3–7B Q4 plausible at usable speeds). NPUs got first-class CPU/GPU peer treatment in software stacks (Apple Intelligence, Gemini Nano). Memory is the binding constraint: 8 GB phones run 1–3B models comfortably; 12 GB Pro models reach 7B Q4; 16 GB iPad Pro reaches 13B Q4.

Laptop

Apple Mac (M-series)

M3 / M3 Pro / M3 Max (Oct 2023) — TSMC N3B. M3 Max up to 128 GB unified memory, ~400 GB/s bandwidth. Extremely strong local-LLM target.
M4 / M4 Pro / M4 Max (Oct 2024) — TSMC N3E. Base M4 ~120 GB/s, M4 Pro ~273 GB/s, M4 Max ~546 GB/s. Up to 128 GB on M4 Max.
M4 Ultra (Mac Studio, March 2025) — two M4 Max dies via UltraFusion. Up to 512 GB unified memory, ~1 TB/s bandwidth. The single most LLM-friendly consumer compute platform that exists.
M5 family (rumored late 2025–2026) — N3P or N2 expected.

The Apple Silicon line is the single most LLM-friendly consumer compute lineup, by structural advantage of unified memory. A maxed Mac Studio runs 405B-class models at usable speeds — no consumer x86 config approaches it without multi-GPU server hardware.

Intel Core Ultra

Core Ultra Series 1 (“Meteor Lake,” Dec 2023) — first Intel chip with on-die NPU.
Core Ultra Series 2: “Lunar Lake” (Sep 2024 — thin-and-light, on-package memory) and “Arrow Lake” (Oct 2024 — desktop and mobile-H).
Panther Lake (late 2025–2026) — Intel 18A process, the test of Intel’s foundry comeback.

Intel’s NPU is third-best of the four major NPUs (vs. Apple, Qualcomm, AMD) on raw throughput. Real-world LLM perf is competitive only if a discrete GPU is added.

AMD

Ryzen AI 300 (“Strix Point,” July 2024) — Zen 5 + RDNA 3.5 iGPU + XDNA 2 NPU.
Ryzen AI Max+ “Strix Halo” (CES 2025, early 2025) — up to 128 GB unified-style LPDDR5X memory with a big iGPU. The closest x86 has come to Apple Silicon’s unified-memory model; targets workstation-class AI workloads. Effectively the only PC laptop platform that approaches Mac on local 70B-class LLM viability.

Qualcomm Snapdragon X

Snapdragon X Elite / X Plus (mid-2024) — Oryon cores on a Windows laptop. Marketed as “AI PC.” Strong CPU perf-per-watt, weak GPU. Software ecosystem (Windows on ARM, ONNX-on-Hexagon) still maturing.
Snapdragon X2 (late 2025) — second-gen Oryon for laptop.

Laptop takeaway for local LLMs

Mac M-series wins across price points — unified memory + bandwidth + the Metal/MLX/llama.cpp stack maturity.
AMD Strix Halo is the first real x86 contender for high-RAM local LLM laptops.
Discrete-GPU PCs (mobile RTX 4080/4090) are competitive at ≤24 GB models but capacity-bound at larger scales.
Qualcomm Snapdragon X is interesting but Windows-on-ARM software lag is the bottleneck.
Intel is competitive only with a discrete GPU.

Desktop

CPUs

AMD Ryzen 9000 (Zen 5, July 2024). Ryzen 9 9800X3D (Nov 2024) — gaming/L3 leader.
Intel Core Ultra 200 desktop (Arrow Lake, Oct 2024) — disappointed against AMD on gaming/perf-per-watt.
Apple Mac Studio with M4 Max / M4 Ultra is the desktop endpoint of the Mac line.

GPUs (the LLM workhorse on PC)

NVIDIA RTX 50 series (Blackwell, Jan 2025): RTX 5090 (32 GB GDDR7, ~1.79 TB/s), 5080 (16 GB), 5070 Ti (16 GB), 5070 (12 GB), 5060 Ti (16/8 GB), 5060 (8 GB).
NVIDIA RTX 40 series (Ada Lovelace, late 2022): RTX 4090 24 GB GDDR6X — still a workhorse for many local-LLM rigs.
AMD Radeon RX 9000 (RDNA 4, March 2025): RX 9070 XT 16 GB. Improved ROCm support but still trailing CUDA in LLM stacks.
Intel Arc Battlemage (late 2024): Arc B580 12 GB. Budget-tier; decent value.

For local LLMs on desktop, the binding number is GPU VRAM:

8 GB → 7B Q4 only
12 GB → 13B Q4
16 GB → 13B FP16 / 30B Q4
24 GB → 70B Q4 (tight)
32 GB (RTX 5090) → 70B Q4 comfortably / Mixtral 8x22B Q4
2× 24 GB (dual 4090) → 70B FP16 with offload

Desktop takeaway

If the goal is maximum LLM throughput per dollar, a desktop with one or two high-VRAM consumer NVIDIA GPUs still leads. If the goal is maximum capacity at home (running 100B+ models), Mac Studio with 192/256 GB unified memory is the only realistic single-box answer at consumer prices.

Server / data center

(Mostly informational for Locara — local apps don’t run here. But these chips set the frontier and trickle down.)

NVIDIA

Hopper (H100 / H200) — H100 (2022), H200 (2024) with HBM3e (141 GB).
Blackwell (B100 / B200 / GB200) — 2024–2025. GB200 NVL72 the rack-scale unit; HBM3e 192 GB per GPU on B200. The reigning frontier-training platform.
Rubin — successor, 2026.

AMD

MI300X / MI325X / MI355X (Instinct) — 192 GB and now 256 GB HBM. Credible #2 to NVIDIA. ROCm software still trailing CUDA but improving fast.
MI400 — announced for late 2026.

Google TPU

TPU v5p (2024), TPU v6 “Trillium” (2024), TPU v7 “Ironwood” (announced 2025). Internal-mostly, available via GCP.

Other

Cerebras WSE-3 (2024) — wafer-scale (~85K cores, no off-chip DRAM). Inference-focused.
Groq LPU — single-stream low-latency inference.
Tenstorrent Wormhole / Blackhole (Jim Keller’s company) — RISC-V-based AI cores; open software.
AWS Trainium / Inferentia, Microsoft Maia, Meta MTIA — hyperscaler in-house silicon, not for sale.
Etched Sohu (2024–2025) — transformer-fixed-graph ASIC; bet that the architecture is stable enough to bake in.

Specific learnings for Locara

Macs are the privileged target for v1. Unified memory + the Metal/MLX/llama.cpp stack means the same Locara app can run 7B-class models on a $599 Mac mini and 70B-class on a Mac Studio without changing code paths. Cross-platform support comes after.
Document a clear “device class” matrix in the manifest schema (see the table at the top of this note). Apps declare minimum tier; runtime computes user’s actual tier and matches. Fail loud at install time if mismatched.
Discrete-GPU PCs are an important secondary target. RTX 4090 / 5090 owners are local-LLM enthusiasts and prosumer Locara customers. Support CUDA via llama.cpp / vLLM-equivalent backends, but recognize that consumer VRAM ceiling caps capability vs. Mac Studio.
Strix Halo opens the x86 wedge. AMD Strix Halo’s 128 GB unified memory + strong iGPU is the first credible non-Apple platform for large-model local AI. Worth tracking; Locara’s cross-platform path likely runs through it before generic Windows-PC support.
Mobile is real but constrained. A18 Pro / A19 Pro / Snapdragon Elite / Tensor G5 can run 3–7B Q4 models. iPad Pro (M4) is the highest-capability handheld for LLMs. Don’t dismiss mobile — design app capability tiers so the same Locara app gracefully scales down.
NPUs remain a fragmented frontier. Apple Neural Engine, Hexagon, XDNA, Intel NPU all exist; tooling is incompatible. Don’t make NPU support a v1 requirement. GPU/CPU paths with NPU as opportunistic acceleration is the durable abstraction.
Server-class compute trickles down ~3–4 years. What ran on 8× H100s in 2023 (175B at FP16) runs on a Mac Studio in 2025 at Q4. The local-AI bet is fundamentally that this trickle continues. As of 2026 the trajectory is solid; the leaders (NVIDIA / AMD / Google / Cerebras) are all shipping new generations on schedule.
Memory bandwidth, in GB/s, is the single most predictive number for “how fast does my model run on this device.” Publish it on every device card alongside RAM size and core count.

References

Chips and Cheese (https://chipsandcheese.com) — deep CPU microarchitecture analysis, especially Apple/Qualcomm/AMD core comparisons.
SemiAnalysis (Dylan Patel) — definitive product/process tracking and economics.
TechPowerUp GPU Database (https://techpowerup.com/gpu-specs/) — exhaustive GPU specs.
Apple WWDC sessions on Metal / MLX / Apple Intelligence (annual, free on developer.apple.com).
NVIDIA GTC keynotes (Jensen Huang’s annual roadmap reveals).
AMD CES / Computex announcements (Lisa Su’s keynotes).
Geekerwan YouTube channel — deep mobile SoC analysis with serious benchmarking discipline.
Notebookcheck (https://notebookcheck.net) — laptop CPU/GPU benchmarks at scale.
r/LocalLLaMA — community-driven hardware testing for local LLM workloads. The single most useful aggregator of “what tok/s does X get on Y.”
Hardware Unboxed / Gamers Nexus (YouTube) — desktop GPU and CPU testing.