Locara

Modern Chip Landscape (early 2026)

What this is: A snapshot of the chips currently shipping in mobile, laptop, desktop, and data-center systems as of early 2026. Not exhaustive — focused on chips relevant to running LLMs locally and chips that define the performance norms a Locara app has to be tuned against. Why it matters: Locara’s targets are the user’s actual hardware. We can’t reason about “what runs on a Mac” or “what fits on a Pixel” without knowing the current product matrix and where it’s heading. Most relevant to Locara: Pairs with chip-fundamentals.md for the underlying mental model. This note focuses on the specific products Locara apps will ship to.

Caveat: this snapshot ages fast. Re-check before making device-targeting decisions; nodes, memory configs, and product lines shift every 12–18 months. Numbers below are best as of writing; treat them as approximate.

Quick reference: device classes for local LLMs

TierExamplesMemory / VRAMRealistic local model
0Phone (8 GB)8 GB1–3B Q4
1Phone Pro / iPad mini, base laptop (16 GB)12–16 GB7B Q4
2Pro laptop (32 GB) / RTX 4060–407032 GB / 12 GB13B Q4 / Mixtral 8x7B Q4
3Mac M-Pro/Max (64 GB) / RTX 4080–4090 / 508064 GB / 16–24 GB30–70B Q4
4Mac Studio (128–256 GB) / RTX 5090 / Strix Halo128+ GB / 32 GB70B FP16 / 405B Q4
5Mac Studio Ultra (512 GB) / multi-GPU rigs256–512+ GBFrontier-scale local

The bandwidth and capacity numbers below explain why each tier sits where it does.

Mobile (phones / tablets)

Apple A-series and M-in-iPad

  • A18 / A18 Pro (iPhone 16 family, Sept 2024) — TSMC N3E. 6-core CPU, 5- or 6-core GPU, 16-core Neural Engine. 8 GB RAM standard. The first iPhone designed around on-device generative AI (“Apple Intelligence” — local ~3B class plus Private Cloud Compute fallback).
  • A19 / A19 Pro (iPhone 17 family, Sept 2025) — TSMC N3P. Memory bumped on Pro models, expanded NPU. Credibly runs ~7B Q4 models on Pro tiers.
  • M4 in iPad Pro (May 2024) — TSMC N3E. Up to 16 GB unified memory. The highest-capability tablet for local LLMs by a wide margin; effectively a Mac in iPad form factor.

Qualcomm Snapdragon (Android flagship)

  • Snapdragon 8 Gen 3 (late 2023) — last on Cortex-X4 cores.
  • Snapdragon 8 Elite (late 2024) — first mobile chip with Qualcomm’s in-house Oryon cores (acquired with Nuvia, 2021). Major IPC jump; closes much of the gap with Apple. TSMC N3E.
  • Snapdragon 8 Elite Gen 2 / variants (late 2025) — second-gen Oryon mobile.
  • Hexagon NPU in all generations. Qualcomm pushed “AI PC” framing hard but third-party LLM tooling for Hexagon remains weak vs. Apple’s MLX or NVIDIA’s CUDA.

MediaTek Dimensity / Google Tensor / Samsung Exynos

  • MediaTek Dimensity 9400 / 9500 (late 2024 / late 2025) — ARM reference cores, TSMC N3. Mid-range Android flagship usage.
  • Google Tensor G4 (Pixel 9, Aug 2024) — last Samsung-fabbed.
  • Google Tensor G5 (Pixel 10, Aug 2025) — first fully Google-designed Tensor, fabbed at TSMC. Optimized for Gemini Nano and on-device successor models.
  • Samsung Exynos 2400 / 2500 — used in some Galaxy variants regionally.

Mobile takeaway

Mobile compute jumped a generation in 2024 with M4 in iPad and Snapdragon Elite — both deliver real on-device LLM headroom (3–7B Q4 plausible at usable speeds). NPUs got first-class CPU/GPU peer treatment in software stacks (Apple Intelligence, Gemini Nano). Memory is the binding constraint: 8 GB phones run 1–3B models comfortably; 12 GB Pro models reach 7B Q4; 16 GB iPad Pro reaches 13B Q4.

Laptop

Apple Mac (M-series)

  • M3 / M3 Pro / M3 Max (Oct 2023) — TSMC N3B. M3 Max up to 128 GB unified memory, ~400 GB/s bandwidth. Extremely strong local-LLM target.
  • M4 / M4 Pro / M4 Max (Oct 2024) — TSMC N3E. Base M4 ~120 GB/s, M4 Pro ~273 GB/s, M4 Max ~546 GB/s. Up to 128 GB on M4 Max.
  • M4 Ultra (Mac Studio, March 2025) — two M4 Max dies via UltraFusion. Up to 512 GB unified memory, ~1 TB/s bandwidth. The single most LLM-friendly consumer compute platform that exists.
  • M5 family (rumored late 2025–2026) — N3P or N2 expected.

The Apple Silicon line is the single most LLM-friendly consumer compute lineup, by structural advantage of unified memory. A maxed Mac Studio runs 405B-class models at usable speeds — no consumer x86 config approaches it without multi-GPU server hardware.

Intel Core Ultra

  • Core Ultra Series 1 (“Meteor Lake,” Dec 2023) — first Intel chip with on-die NPU.
  • Core Ultra Series 2: “Lunar Lake” (Sep 2024 — thin-and-light, on-package memory) and “Arrow Lake” (Oct 2024 — desktop and mobile-H).
  • Panther Lake (late 2025–2026) — Intel 18A process, the test of Intel’s foundry comeback.

Intel’s NPU is third-best of the four major NPUs (vs. Apple, Qualcomm, AMD) on raw throughput. Real-world LLM perf is competitive only if a discrete GPU is added.

AMD

  • Ryzen AI 300 (“Strix Point,” July 2024) — Zen 5 + RDNA 3.5 iGPU + XDNA 2 NPU.
  • Ryzen AI Max+ “Strix Halo” (CES 2025, early 2025) — up to 128 GB unified-style LPDDR5X memory with a big iGPU. The closest x86 has come to Apple Silicon’s unified-memory model; targets workstation-class AI workloads. Effectively the only PC laptop platform that approaches Mac on local 70B-class LLM viability.

Qualcomm Snapdragon X

  • Snapdragon X Elite / X Plus (mid-2024) — Oryon cores on a Windows laptop. Marketed as “AI PC.” Strong CPU perf-per-watt, weak GPU. Software ecosystem (Windows on ARM, ONNX-on-Hexagon) still maturing.
  • Snapdragon X2 (late 2025) — second-gen Oryon for laptop.

Laptop takeaway for local LLMs

  • Mac M-series wins across price points — unified memory + bandwidth + the Metal/MLX/llama.cpp stack maturity.
  • AMD Strix Halo is the first real x86 contender for high-RAM local LLM laptops.
  • Discrete-GPU PCs (mobile RTX 4080/4090) are competitive at ≤24 GB models but capacity-bound at larger scales.
  • Qualcomm Snapdragon X is interesting but Windows-on-ARM software lag is the bottleneck.
  • Intel is competitive only with a discrete GPU.

Desktop

CPUs

  • AMD Ryzen 9000 (Zen 5, July 2024). Ryzen 9 9800X3D (Nov 2024) — gaming/L3 leader.
  • Intel Core Ultra 200 desktop (Arrow Lake, Oct 2024) — disappointed against AMD on gaming/perf-per-watt.
  • Apple Mac Studio with M4 Max / M4 Ultra is the desktop endpoint of the Mac line.

GPUs (the LLM workhorse on PC)

  • NVIDIA RTX 50 series (Blackwell, Jan 2025): RTX 5090 (32 GB GDDR7, ~1.79 TB/s), 5080 (16 GB), 5070 Ti (16 GB), 5070 (12 GB), 5060 Ti (16/8 GB), 5060 (8 GB).
  • NVIDIA RTX 40 series (Ada Lovelace, late 2022): RTX 4090 24 GB GDDR6X — still a workhorse for many local-LLM rigs.
  • AMD Radeon RX 9000 (RDNA 4, March 2025): RX 9070 XT 16 GB. Improved ROCm support but still trailing CUDA in LLM stacks.
  • Intel Arc Battlemage (late 2024): Arc B580 12 GB. Budget-tier; decent value.

For local LLMs on desktop, the binding number is GPU VRAM:

  • 8 GB → 7B Q4 only
  • 12 GB → 13B Q4
  • 16 GB → 13B FP16 / 30B Q4
  • 24 GB → 70B Q4 (tight)
  • 32 GB (RTX 5090) → 70B Q4 comfortably / Mixtral 8x22B Q4
  • 2× 24 GB (dual 4090) → 70B FP16 with offload

Desktop takeaway

If the goal is maximum LLM throughput per dollar, a desktop with one or two high-VRAM consumer NVIDIA GPUs still leads. If the goal is maximum capacity at home (running 100B+ models), Mac Studio with 192/256 GB unified memory is the only realistic single-box answer at consumer prices.

Server / data center

(Mostly informational for Locara — local apps don’t run here. But these chips set the frontier and trickle down.)

NVIDIA

  • Hopper (H100 / H200) — H100 (2022), H200 (2024) with HBM3e (141 GB).
  • Blackwell (B100 / B200 / GB200) — 2024–2025. GB200 NVL72 the rack-scale unit; HBM3e 192 GB per GPU on B200. The reigning frontier-training platform.
  • Rubin — successor, 2026.

AMD

  • MI300X / MI325X / MI355X (Instinct) — 192 GB and now 256 GB HBM. Credible #2 to NVIDIA. ROCm software still trailing CUDA but improving fast.
  • MI400 — announced for late 2026.

Google TPU

  • TPU v5p (2024), TPU v6 “Trillium” (2024), TPU v7 “Ironwood” (announced 2025). Internal-mostly, available via GCP.

Other

  • Cerebras WSE-3 (2024) — wafer-scale (~85K cores, no off-chip DRAM). Inference-focused.
  • Groq LPU — single-stream low-latency inference.
  • Tenstorrent Wormhole / Blackhole (Jim Keller’s company) — RISC-V-based AI cores; open software.
  • AWS Trainium / Inferentia, Microsoft Maia, Meta MTIA — hyperscaler in-house silicon, not for sale.
  • Etched Sohu (2024–2025) — transformer-fixed-graph ASIC; bet that the architecture is stable enough to bake in.

Specific learnings for Locara

  1. Macs are the privileged target for v1. Unified memory + the Metal/MLX/llama.cpp stack means the same Locara app can run 7B-class models on a $599 Mac mini and 70B-class on a Mac Studio without changing code paths. Cross-platform support comes after.
  2. Document a clear “device class” matrix in the manifest schema (see the table at the top of this note). Apps declare minimum tier; runtime computes user’s actual tier and matches. Fail loud at install time if mismatched.
  3. Discrete-GPU PCs are an important secondary target. RTX 4090 / 5090 owners are local-LLM enthusiasts and prosumer Locara customers. Support CUDA via llama.cpp / vLLM-equivalent backends, but recognize that consumer VRAM ceiling caps capability vs. Mac Studio.
  4. Strix Halo opens the x86 wedge. AMD Strix Halo’s 128 GB unified memory + strong iGPU is the first credible non-Apple platform for large-model local AI. Worth tracking; Locara’s cross-platform path likely runs through it before generic Windows-PC support.
  5. Mobile is real but constrained. A18 Pro / A19 Pro / Snapdragon Elite / Tensor G5 can run 3–7B Q4 models. iPad Pro (M4) is the highest-capability handheld for LLMs. Don’t dismiss mobile — design app capability tiers so the same Locara app gracefully scales down.
  6. NPUs remain a fragmented frontier. Apple Neural Engine, Hexagon, XDNA, Intel NPU all exist; tooling is incompatible. Don’t make NPU support a v1 requirement. GPU/CPU paths with NPU as opportunistic acceleration is the durable abstraction.
  7. Server-class compute trickles down ~3–4 years. What ran on 8× H100s in 2023 (175B at FP16) runs on a Mac Studio in 2025 at Q4. The local-AI bet is fundamentally that this trickle continues. As of 2026 the trajectory is solid; the leaders (NVIDIA / AMD / Google / Cerebras) are all shipping new generations on schedule.
  8. Memory bandwidth, in GB/s, is the single most predictive number for “how fast does my model run on this device.” Publish it on every device card alongside RAM size and core count.

References

  • Chips and Cheese (https://chipsandcheese.com) — deep CPU microarchitecture analysis, especially Apple/Qualcomm/AMD core comparisons.
  • SemiAnalysis (Dylan Patel) — definitive product/process tracking and economics.
  • TechPowerUp GPU Database (https://techpowerup.com/gpu-specs/) — exhaustive GPU specs.
  • Apple WWDC sessions on Metal / MLX / Apple Intelligence (annual, free on developer.apple.com).
  • NVIDIA GTC keynotes (Jensen Huang’s annual roadmap reveals).
  • AMD CES / Computex announcements (Lisa Su’s keynotes).
  • Geekerwan YouTube channel — deep mobile SoC analysis with serious benchmarking discipline.
  • Notebookcheck (https://notebookcheck.net) — laptop CPU/GPU benchmarks at scale.
  • r/LocalLLaMA — community-driven hardware testing for local LLM workloads. The single most useful aggregator of “what tok/s does X get on Y.”
  • Hardware Unboxed / Gamers Nexus (YouTube) — desktop GPU and CPU testing.