Apple Foundation Models Framework

What it is: The Swift/Python developer SDK for Apple’s on-device language model — the ~3B-parameter foundation model that powers Apple Intelligence, exposed at WWDC 2025 as a public API for third-party apps. Includes guided generation, constrained tool calling, and LoRA adapter fine-tuning. Status: Available with macOS 26 (Tahoe) / iOS 19 (announced WWDC 2025, June 2025); Python bindings followed via apple/python-apple-fm-sdk. Apple-only, free for developers, no API key. Most relevant to Locara: This is Apple’s own play for the same wedge Locara is targeting — “developers can run private LLMs on the user’s device, no cloud dependency.” Locara has to take an explicit position on coexistence vs. competition.

Background

Apple announced Apple Intelligence at WWDC 2024 as a user-facing feature: writing tools, summaries, image generation, a smarter Siri. Under the hood was a hybrid: a ~3B-parameter on-device language model running entirely on the user’s Apple Silicon, plus a larger server model accessed via Private Cloud Compute (PCC) — Apple Silicon servers with attestation-verified enclaves so that even Apple cannot read the requests.

At WWDC 2025, Apple opened the on-device model to third-party developers via the Foundation Models framework. This was the first time Apple shipped a developer-facing language model API at the OS level. Adoption is now a one-line Swift call: let session = LanguageModelSession().

The framework supports:

Guided generation — structured output via Swift macros that define a schema (the model is constrained to produce values matching declared types).
Constrained tool calling — function calling with type-safe Swift tools.
LoRA adapter fine-tuning — apps can ship their own fine-tuned adapter without retraining or distributing the base weights.
Streaming and chunked generation — for live UI updates.
Optional PCC fallback — apps can opt in to “use the larger server model with PCC privacy guarantees” for harder requests.

The on-device model is tuned for tasks like summarization, entity extraction, text refinement, short dialogue, and template-driven content. Apple positions it explicitly not as a frontier-model competitor — it’s a baseline capability for app developers to build on without needing a cloud subscription.

Key design decisions

Single shared model across the OS. All apps use the same 3B model (vs. each app shipping its own weights). Massive disk-and-memory savings; no model duplication.
PCC as the privacy story for server inference. Apple-Silicon servers, attestation-verified, ephemeral state, no logging. The most-credible “trustable cloud AI” deployment to date.
Swift-first API. FoundationModels module, @Generable macro for output schemas, Tool protocol for tool calls. Idiomatic Swift, reads like SwiftUI.
No API key, no per-request billing. Free for developers; cost is borne by the user’s device.
LoRA adapters as the customization unit. Apps fine-tune for their domain without retraining the base model; adapters are tiny (megabytes).
Apple Intelligence eligibility gates the API. Devices must meet the Apple Intelligence hardware bar (M1+ on Mac, A17 Pro+ on iPhone, M-series on iPad) — this is also Locara’s effective floor.
Closed weights, closed training data. The model is not downloadable, not fine-tunable beyond LoRA, not introspectable.

What worked

Trust angle is real. Apple’s PCC architecture is a serious engineering effort; security researchers have generally praised the threat model. “Run on-device or on a verified Apple server” is a positioning competitors can’t easily match.
Developer DX is excellent. A few lines of Swift gets a working language-model integration. WWDC 2025 demos showed this clearly.
Disk efficiency. Hundreds of apps now using the same 3B model rather than each shipping a 5–8 GB bundle.
LoRA adapters as customization is a clean answer for “specialize the model without owning the model.”
Hardware/software co-design. Apple Silicon’s NPU + GPU + unified memory is what makes the on-device model fast enough; Apple controls both layers.
macOS-first iOS-second iPad-also-included. Cross-Apple-platform from day one for any compatible device.

What failed / criticisms

Initial Apple Intelligence rollout was rough. The user-facing features in late 2024 were panned — summaries garbled news headlines, Image Playground produced uncanny results. The 2025 round of model updates improved this materially but the brand took damage.
3B is a real ceiling. The on-device model is good for narrow, structured tasks; for free-form reasoning or long documents, it lags meaningfully behind frontier-class models (and behind 7–14B open weights running on the same hardware).
Apple-only. Not portable. Not OSS. Not inspectable.
Locked-down customization. LoRA adapters only; no full fine-tunes, no swapping the base model, no architecture changes.
PCC requires a network connection and an Apple account. For “fully offline” use, only the on-device model is available.
Closed roadmap. No public discussion of future model versions, sizes, or capabilities — developers can’t plan for what’s coming.
Swift-first means limited reach. The Python SDK helps but is unlikely to become the canonical path; Swift-only locks out cross-platform tooling.
Performance metrics not published in detail. Apple has shared some benchmarks, but reproducibility is limited compared to open-weights vendors.

Specific learnings for Locara

Apple Foundation Models is a coexistence partner, not a competitor — for now. The 3B model is good for app-glue tasks; Locara apps that need 7B+ reasoning or full-document RAG need bigger weights anyway. The right framing: Locara is the layer that lets apps go beyond what Apple’s built-in model can do.
PCC sets the privacy bar Locara has to clear. “Local-first, with a verified-enclave fallback” is now the floor users will compare Locara to. This means the privacy story can’t be merely “data stays on your machine”; it needs to be “and we don’t even have a server option, unlike Apple, so the question doesn’t apply.”
Steal the API shape, not the weights. Apple’s LanguageModelSession, @Generable, and Tool patterns are good. Locara’s Swift SDK should feel similarly idiomatic — call it LocaraSession and follow the same conventions where they make sense. Familiarity is a feature.
Locara apps can use the Apple model as a tool, not a substitute. A Locara app that uses Apple’s 3B for summarization (cheap, on-device) and a Locara-managed Llama-class model for reasoning (also on-device, larger) is the high-leverage architecture. Manifest declares which models it needs; the runtime arbitrates.
Apple Intelligence’s hardware floor is also Locara’s. Targeting Apple Silicon Macs M1 and up matches Apple’s own constraint and is the right minimum bar for v1.
LoRA-as-customization is a primitive Locara should expose too. Apps that ship a base-model + LoRA adapter rather than a full fine-tune is the right packaging story — small payload, content-addressed dedup, fast switching.
Apple’s closed-roadmap risk is real. If a future macOS expands the on-device model or restricts third-party LLMs, Locara’s positioning shifts. Hedge by making sure Locara apps are not strictly a superset of Apple’s API — they should also do things Apple structurally can’t (full open-weights choice, fine-tuning beyond LoRA, no Apple ID required, runs on jailbroken or Linux Macs).
Coexistence framing for the manifesto. Don’t position Locara as anti-Apple. The honest pitch: Apple gives you one model, opaque, gated by your Apple ID. Locara gives you any open-weights model, inspectable, runnable without an account, customizable beyond Apple’s caps. Both can live on the same device.
Watch the iOS path carefully. On iOS, Apple Intelligence is the system path. Locara’s iOS strategy (deferred to v2) needs to thread Apple’s review and capability constraints; using Foundation Models via the system API as one available model in Locara is a plausible path that avoids fighting Apple’s review.

References

https://developer.apple.com/documentation/FoundationModels
https://machinelearning.apple.com/research/introducing-apple-foundation-models (June 2024)
https://machinelearning.apple.com/research/apple-foundation-models-2025-updates
https://machinelearning.apple.com/research/apple-foundation-models-tech-report-2025
https://github.com/apple/python-apple-fm-sdk
WWDC 2024 keynote — Apple Intelligence introduction
WWDC 2025 keynote — Foundation Models framework opening to developers
“Apple Launches On-Device AI Framework, LLM Tools, and OS Redesign for Developers” — ADTmag, 2025-06-10
Apple’s Private Cloud Compute architecture posts (security.apple.com)