macOS Power, Energy, and Thermal Optimization

What this is: A reference on how power, energy, and thermal behavior work on macOS — from CMOS physics through DVFS, the P-core/E-core split, App Nap, RunningBoard, IOKit power assertions, Low Power Mode, the thermal state machine, QoS scheduling, the Activity Monitor Energy Impact model, and MetricKit for production telemetry. Sourced from canonical Apple WWDCs, the XNU scheduler source, Howard Oakley’s measurement-based posts (Eclectic Light), Jeff Johnson, Quinn at Apple DTS, Pierre Habouzit’s libdispatch guidance, Bill Dally on hardware energy, and the AnandTech archive. Why it matters: Locara apps want to be good battery citizens on MacBook Airs that thermal-throttle within 30–90 seconds of sustained inference. The user blames the local-AI app when their laptop gets hot and fan-noisy; the only defense is knowing which signals the OS exposes and how to respect them. This note is the reference for how to be a power-aware app, not just a fast one. Most relevant to Locara: Pairs with mac-llm-optimization.md (LLM-specific throttling responses), macos-memory-management.md (memory pressure signals; same dispatch-source pattern), mac-performance-profiling.md (Instruments Energy Log, powermetrics deep dive), mac-hardware-lineup.md (per-SKU thermal envelopes).

Part 1 — The fundamental physics

1.1 CMOS dynamic power: `P = α · C · V² · f`

Every transition on a CMOS gate dumps a ½·C·V² energy packet. Multiplied by switching activity factor α and clock frequency f, this gives the dynamic-power equation every chip architect lives under:

P_dynamic = α · C · V² · f

Two implications driving every optimization in this note:

Voltage is the killer. Halving V cuts dynamic power by ~4×. This is why DVFS lowers voltage as aggressively as it lowers frequency.
Frequency scaling alone is barely-quadratic at best. Lowering f at fixed V cuts power linearly with f, but raising f usually requires raising V to keep timing margins — hence the cubic-looking energy curves in real measurements.

Howard Oakley uses the same equation in his Apple Silicon power posts: “the formula for estimating dynamic power use is P = C · f · V²” and notes that “a core will use twice as much power when its frequency is doubled from idle to 1,400 MHz, and for M3 P-cores, power consumption increases nearly 6× when frequency is increased from idle (~700 MHz) to its maximum (~4,000 MHz).” [Eclectic Light, Apple silicon: 2 Power and thermal glory]

Static (leakage) power adds a V-and-temperature-dependent floor. On a fanless Air, leakage rises with junction temperature — so a thermal event compounds: hot chip → more leakage → still hotter chip → throttle.

1.2 DVFS — Dynamic Voltage and Frequency Scaling

DVFS is the OS’s lever on the equation above. macOS does not expose DVFS knobs to apps; it consumes signals (QoS, thermalState, lowPowerMode, AC vs battery, recent activity) and the CLPC (Closed-Loop Performance Controller) picks an operating point per CPU cluster. Howard Oakley measured 17 intermediate frequencies for M3/M4 P-clusters between idle and max; the ladder is finer near the top because that’s where energy returns are worst. [Eclectic Light, What are CPU core frequencies in Apple silicon Macs?]

Reference frequencies (powermetrics-observed, per Oakley):

Chip	E-core max	P-core max
M1	~2064 MHz	~3228 MHz
M2	~2424 MHz	~3696 MHz
M3	~2748 MHz	~4056 MHz
M4	~2892 MHz	~4512 MHz

Pro/Max variants can shift these; Air variants are sometimes binned lower. Numbers vary across macOS releases as CLPC tables get tuned.

1.3 Why P-core / E-core exists — the Hennessy/Patterson inflection

The end of Dennard scaling (~2007, gone by ~2012) broke the “frequency scales free with density” assumption. With single-thread perf no longer growing exponentially, every additional watt has to come from architecture, not voltage. Hennessy & Patterson’s A New Golden Age for Computer Architecture (CACM, 2019) frames the response: domain-specific accelerators (TPU, ANE), heterogeneous big.LITTLE cores, and aggressive DVFS.

Apple’s “Firestorm” (P) + “Icestorm” (E) split on M1 is a textbook implementation. Per AnandTech (Frumusanu): P-cores are ~3-4× faster per thread but draw ~10× more power than E-cores at peak. Oakley reports a P-core under a single-thread synthetic load draws “~1000 mW for a single thread/core, increasing by 900 mW per additional thread/core, to a maximum of 3700 mW” for the cluster, while real workloads (Apple Archive compression on a 4-thread P-cluster) hit ~7300 mW. [Eclectic Light, Power on Tap: Dynamic control of P cores in M1 chips]

E-cores idle near 1 W per cluster and run their highest-priority threads at modest voltages. The energy-per-instruction gap is what makes “run background work on E-cores” the single biggest battery lever your app pulls.

1.4 Communication, not compute, is the power problem (Dally)

Bill Dally’s recurring claim across NVIDIA GTC keynotes and the Stanford AHA retreat: the energy cost of moving operands dominates the energy cost of operating on them. Canonical picojoule-per-word table for memory access [Dally, Stanford AHA 2023]:

Where the word lives	Energy per 32-bit word
Local 8 kB SRAM	~5 pJ
On-chip (hundreds of MB)	~50 pJ
Off-chip LPDDR / HBM	~640 pJ

Compare to an integer add (~32 fJ = 0.032 pJ on a simple ARM core) and even the instruction overhead (~250 pJ) dwarfs the operation itself. Dally’s framing: the math (datapath + MAC) and the buffers (input/weight/accumulation) each account for ~47% of energy in a well-designed accelerator; pure inter-component data movement is ~6%. The other 90%+ is data moving over short distances inside the chip. The lesson for app developers: minimize copies, prefer locality, batch work to amortize one fetch over many ops. On a UMA Mac, this means: don’t redundantly copy MTLResourceStorageMode.shared buffers; on any Mac: keep hot data L1-resident, not just RAM-resident.

1.5 Memory hierarchy energy (Drepper)

Ulrich Drepper’s What Every Programmer Should Know About Memory (LWN, 2007) predates the picojoule tables but anchors the same intuition. Key Drepper points relevant to power:

DRAM refresh is unavoidable energy: every row must be rewritten every ~64 ms (DDR), continuously, even when idle. This is part of why M-series LPDDR matters — lower V_dd, deeper self-refresh, package proximity (UMA) means fewer pin-driver joules.
Cache misses are 10-100× more expensive than hits — both in time and in energy. A predictable, sequential access pattern is fed by the hardware prefetcher and stays L1/L2-resident; a random pattern triggers DRAM round-trips at hundreds of picojoules each.
Read vs write asymmetry: writes go through write-buffers and may evict another line; both halves cost.

The practical macOS takeaway: an algorithm that gets cache-friendly (struct-of-arrays for hot loops, blocked matrix kernels, locality-aware scheduling) is also the more battery-friendly algorithm, before you touch any Apple-specific API.

1.6 GPU power: ALU active vs memory-bound

On Apple’s GPU, the dominant energy sinks (per WWDC22 Squeeze the most out of Apple GPUs with Metal performance counters) are texture fetches and unfused compute kernels. Roofline analysis applies: if your kernel is memory-bound the ALUs are stalled but DRAM is hot — worst-of-both-worlds energy. Fusing kernels (so intermediate values stay in registers/threadgroup memory) and reducing texture resolution where perceptually invisible are the standard moves. See apple-acceleration-frameworks.md for the Metal compute deep-dive.

Part 2 — macOS’s power architecture

2.1 App Nap (10.9 Mavericks +)

What it does: suspends timer ticks and throttles CPU/IO for backgrounded apps that haven’t been recently interacted with. Specifically, App Nap can apply when the app:

has no visible windows (or all windows are hidden/minimized),
is not playing audio, and
has no active power assertions.

How to opt out:

User-facing (legacy): Finder’s Get Info → “Prevent App Nap” checkbox, backed by defaults write <bundle.id> NSAppSleepDisabled -bool YES (formerly NSAppNapDisabled). Both keys have appeared across releases; Jeff Johnson notes the NSAppSleepDisabled form is the modern one. [Jeff Johnson, Prevent App Nap Programmatically]
Programmatic: NSProcessInfo.beginActivity(options:reason:). Hold the returned token (strong reference) for the duration of the activity, then call endActivity(_:). Per Apple’s Energy Efficiency Guide for Mac Apps — Prioritize Work at the App Level:
```
let activity = ProcessInfo.processInfo.beginActivity(
    options: [.userInitiated, .idleSystemSleepDisabled],
    reason: "Running local LLM inference")
// ... long-running work ...
ProcessInfo.processInfo.endActivity(activity)
```

The NSActivityOptions bitmask:

.idleDisplaySleepDisabled — prevent display sleep
.idleSystemSleepDisabled — prevent idle system sleep
.suddenTerminationDisabled — opt out of sudden termination during this activity
.automaticTerminationDisabled — opt out of automatic termination
.userInitiated — convenience = userInitiatedAllowingIdleSystemSleep | idleSystemSleepDisabled
.userInitiatedAllowingIdleSystemSleep — same but lets the system sleep per user preference
.background — discretionary maintenance; permits App Nap deferral
.latencyCritical — real-time work (audio, video capture); use sparingly — forbids CPU throttling

Quinn’s exact nudge in forum thread 679178: “Why NSActivityUserInitiatedAllowingIdleSystemSleep? It seems that NSActivityUserInitiated would make more sense here, in that you really don’t want the system to sleep while this work is in progress.” The lesson: choose the option that matches what you actually need — don’t over-permit sleep just because you saw it in sample code. [Quinn “The Eskimo!”, Apple Developer Forums thread 679178]

Howard Oakley’s “states” enumeration for App Nap behavior in Ventura (state 2 = napping, state 3 = “undead” with NSSupportsAutomaticTermination, state 4 = “nascent” partially loaded) is the practitioner reference for what the system is doing when your app seems to have vanished from the Dock but is still in Activity Monitor. [Eclectic Light, App Nap, undead and nascent apps in Ventura]

2.2 Sudden Termination + Automatic Termination

Two opt-in lifecycle accelerators. Sudden Termination lets the OS kill your app immediately (skipping applicationShouldTerminate:) when it’s idle — set via ProcessInfo.processInfo.enableSuddenTermination() / disableSuddenTermination(), balanced like reference counts. Automatic Termination lets the OS quit your foreground app when all its windows close, recreating it on next launch — set via NSSupportsAutomaticTermination = true in Info.plist plus disableAutomaticTermination(_:) / enableAutomaticTermination(_:) calls.

Both reduce wasted energy on backgrounded apps the user no longer cares about. The undead state Oakley describes is NSSupportsAutomaticTermination = true apps that get suspended into App Nap and don’t visibly quit — they remain in the process table for fast resume.

2.3 RunningBoard — the managed-process daemon

RunningBoard arrived in macOS 10.15 Catalina, ported from iOS. Per Howard Oakley’s deep dive series, every app launch goes through RunningBoard for tracking (lifecycle assertions, “Battlecruiser operational” log line on launch, “Death sentinel fired!” on exit), but only some categories of app are also managed by RunningBoard with active control over memory, CPU, GPU, and lifecycle:

Traditional AppKit apps: tracked, not managed. RunningBoard records assertions but won’t kill them under pressure.
Mac Catalyst apps: tracked and managed — RunningBoard enforces iOS-style limits.
“Designed for iPad” apps: managed with the ~16 GB cap noted in macos-memory-management.md.

Practically: console-app filter on subsystem:com.apple.runningboard to see who’s being asserted, suspended, or killed. The dance with launchd is that launchd still owns process spawning; RunningBoard owns post-launch policy. [Eclectic Light, How RunningBoard tracks every app, and manages some (2019); How macOS manages iOS apps: RunningBoard comes of age (2021)]

2.4 IOKit power assertions

The C API for “keep the Mac awake while I do this thing.” Always pair every IOPMAssertionCreateWithName with IOPMAssertionRelease. Leaked assertions are the #1 self-inflicted battery bug — they survive your app crash if you don’t put a release in atexit and a signal handler.

The current named types (from IOKit/IOPMLib.h):

kIOPMAssertionTypePreventUserIdleSystemSleep — system stays awake when idle. Display may dim/sleep, system may still sleep on lid close or low battery. The default choice for “I’m computing, don’t sleep on me.”
kIOPMAssertionTypePreventUserIdleDisplaySleep — display stays awake. Implies system stay-awake too. Use only for active video / presentations.
kIOPMAssertionTypePreventSystemSleep — even lid-close won’t sleep. Aggressive. Use only for completing a critical I/O burst.
kIOPMAssertionTypeNoIdleSleep (deprecated) → replaced by PreventUserIdleSystemSleep
kIOPMAssertionTypeNoDisplaySleep (deprecated) → replaced by PreventUserIdleDisplaySleep

Pattern:

IOPMAssertionID assertion;
IOPMAssertionCreateWithName(
    kIOPMAssertionTypePreventUserIdleSystemSleep,
    kIOPMAssertionLevelOn,
    CFSTR("Locara is running an inference"),
    &assertion);
// ...work...
IOPMAssertionRelease(assertion);

Verify your assertions with pmset -g assertions (snapshot) and pmset -g assertionslog (history, 10.6+). Anything that doesn’t unwind cleanly will show up there.

Quinn’s standing guidance across multiple forum threads: prefer NSProcessInfo.beginActivity (the Foundation wrapper) over raw IOPMAssertion* for anything that’s really about activity rather than display/system sleep, because beginActivity composes correctly with App Nap and the rest of the energy stack. Drop to IOKit only when you need a sleep-prevention type Foundation doesn’t model.

2.5 Low Power Mode on macOS

Introduced in macOS 12 Monterey for some Macs, expanded in Ventura/Sonoma. User toggle in System Settings → Battery → Low Power Mode (separate switches for “On battery” / “On power adapter”). Apple documents that LPM reduces clock speeds, dims display, lowers screen brightness updates, and defers background activity. [Apple Support HT101613]

API:

ProcessInfo.processInfo.isLowPowerModeEnabled : Bool
NotificationCenter.default.addObserver(forName: .NSProcessInfoPowerStateDidChange, ...)

NotificationCenter.default.addObserver(
    forName: .NSProcessInfoPowerStateDidChange,
    object: nil,
    queue: .main) { _ in
        if ProcessInfo.processInfo.isLowPowerModeEnabled {
            // throttle inference, drop max_tokens, etc.
        }
    }

Quinn’s repeated point: treat LPM as user intent, not as a hostile environment. If the user enabled LPM, the right response is to do less work, not to “outsmart” the OS by pinning P-cores.

2.6 Thermal state

ProcessInfo.processInfo.thermalState returns .nominal | .fair | .serious | .critical. Apple’s Respond to Thermal State Changes doc gives the canonical mapping:

State	User-visible	App response
`.nominal`	normal	none required; keep optimizing
`.fair`	fans may spin up	begin reducing CPU/GPU
`.serious`	fans at max, perf impacted	required: cut CPU/GPU, drop frame rate, disable nonessential effects
`.critical`	system needs to cool	immediate: minimize all activity, stop camera/peripherals

Subscribe to ProcessInfo.thermalStateDidChangeNotification (posted on the global dispatch queue, so dispatch back to your main actor before touching UI).

Important caveat from Dave MacLachlan and Stanislas’s writeups: the public Foundation enum is coarser than the underlying signal. The Darwin notification com.apple.system.thermalpressurelevel (constant kOSThermalNotificationPressureLevelName, defined in OSThermalNotification.h) exposes five levels:

Darwin level	Maps to `ProcessInfo.thermalState`
`Nominal` (0)	`.nominal`
`Moderate` (1)	`.fair`
`Heavy` (2)	`.fair` ← still fair, but throttling has begun
`Trapping` (3)	`.serious` / `.critical` (varies by release)
`Sleeping` (4)	`.critical`

Use notify_register_check / notify_get_state on kOSThermalNotificationPressureLevelName if you need to distinguish Moderate from Heavy (the threshold at which real throttling kicks in). No root required. Approximate, since the heavy→serious boundary has shifted across releases. [Stanislas, Building a macOS app to know when my Mac is thermal throttling; Dave MacLachlan, Thermals and macOS]

Also useful: IOPMCopyCPUPowerStatus returns a dict with kIOPMCPUPowerLimitProcessorSpeedKey (the current speed limit being applied) and kIOPMCPUPowerLimitProcessorCountKey (active processor count) — direct evidence the OS is throttling you.

2.7 QoS classes — the OS’s primary scheduling signal

QoS is the only knob most apps should ever touch for scheduling on Apple Silicon. The five canonical levels (cf. Habouzit-era libdispatch docs; Oakley’s mapping to raw qos_class_t integer values from pthread.h):

QoS class	Integer	Typical work
`QOS_CLASS_USER_INTERACTIVE`	33	UI animations, hit testing
`QOS_CLASS_USER_INITIATED`	25	results requested by user, waiting blocks UI
`QOS_CLASS_DEFAULT`	21	unspecified
`QOS_CLASS_UTILITY`	17	progress-bar work, user kicked it off but isn’t watching
`QOS_CLASS_BACKGROUND`	9	indexing, prefetch, opportunistic work

How they map to P/E cores (XNU clutch scheduler + Edge extension on AMP systems):

QOS_CLASS_BACKGROUND (9): E-cores only, lowest cluster frequency (~1020 MHz on M3 E-cluster minimum).
QOS_CLASS_UTILITY (17): preferentially E-cores.
QOS_CLASS_DEFAULT (21) / USER_INITIATED (25) / USER_INTERACTIVE (33): preferentially P-cores; “overflow” to E-cores if all P-cores are saturated, in which case the E-cluster is briefly run at its max frequency to keep latency low.

[Eclectic Light, Apple silicon: 1 Cores, clusters and performance; XNU doc/scheduler/sched_clutch_edge.md]

Cost of mis-prioritization: a userInteractive job that’s actually a 30-second background re-index runs at ~3-4× the energy of the same work tagged utility. (Concrete number depends on chip and workload; the qualitative loss is invariable.) Apple’s WWDC and Tech Talks guidance is explicit: set QoS on every queue and every detached task; never let the default propagate by accident.

Set QoS:

C / Obj-C: pthread_set_qos_class_self_np(QOS_CLASS_UTILITY, 0)
GCD: DispatchQueue(label: "x", qos: .utility) or DispatchQueue.global(qos: .utility).async { ... }
Swift Concurrency: Task(priority: .utility) { ... } or Task.detached(priority: .background) { ... }
Operation queue: queue.qualityOfService = .utility

QoS inheritance is automatic through Dispatch and Swift Concurrency; it is not automatic through raw pthread_create or manual thread management. So a worker thread you spawned with Thread() will inherit the creator’s QoS only if you set it explicitly; otherwise it lands at default. This is the single subtlest “why is my background work running on P-cores” trap.

Habouzit’s libdispatch guidance (via Thomas Clement’s Making efficient use of libdispatch):

Keep ~3-4 long-lived queues representing execution contexts; do not spin up a queue per subsystem.
Don’t use dispatch_get_global_queue for substantive work — it triggers thread explosion when threads block (the runtime sees an “inactive” thread and creates another).
For small work (< 1 ms) prefer a lock + direct call over async.
Don’t block on a dispatch_semaphore after async; use completion handlers or sync APIs.
Priority inversion is the recurring bug — pthread_mutex and os_unfair_lock are owner-aware and the kernel can priority-boost the holder; dispatch_semaphore and condition variables are not, so a high-QoS waiter can be blocked by a BACKGROUND holder indefinitely.

2.8 NSWorkspace + power-source detection

NSWorkspace.shared.notificationCenter posts:

.didWakeNotification, .willSleepNotification
.screensDidSleepNotification, .screensDidWakeNotification
.didActivateApplicationNotification / .didDeactivateApplicationNotification

Screen lock isn’t a NSWorkspace notification — observe DistributedNotificationCenter.default() with name "com.apple.screenIsLocked" / "com.apple.screenIsUnlocked".

AC vs battery (IOKit power source API):

CFTypeRef sourcesInfo = IOPSCopyPowerSourcesInfo();
CFArrayRef sources = IOPSCopyPowerSourcesList(sourcesInfo);
for (CFIndex i = 0; i < CFArrayGetCount(sources); ++i) {
    CFDictionaryRef desc = IOPSGetPowerSourceDescription(sourcesInfo,
                                CFArrayGetValueAtIndex(sources, i));
    CFStringRef state = CFDictionaryGetValue(desc, CFSTR(kIOPSPowerSourceStateKey));
    bool onAC = CFEqual(state, CFSTR(kIOPSACPowerValue));
    // kIOPSBatteryPowerValue if on battery
}

For change notifications use IOPSNotificationCreateRunLoopSource and add the resulting source to your run loop.

Part 3 — The Energy Impact model (Activity Monitor’s “Energy” tab)

This number is not magic; it’s a weighted linear combination derived by Nicholas Nethercote of Mozilla in 2015 by diffing top -stats power against Activity Monitor and inspecting /usr/share/pmenergy/Mac-*.plist. [Nethercote, What does the OS X Activity Monitor’s “Energy Impact” actually measure?]

The coefficients are per-Mac-model, stored in /usr/share/pmenergy/Mac-<board-id>.plist. A representative set:

Term	Coefficient (one Mac model)
CPU seconds	1.0
Idle wake-ups	2.0 × 10⁻⁴ per wake (≈ 200 µs CPU-equivalent)
GPU time	3.0
Disk writes	5.3 × 10⁻¹⁰ per byte
Network packets sent/received	4.0 × 10⁻⁶ per packet
QoS-background multiplier	0.52 (background work is taxed less)

The displayed number is then averaged over windows: a short EMA for “now” plus 8-hour and 12-hour averages for the “Avg Energy Impact” columns.

Why an idle-looking app can read high: idle wake-ups. Each unnecessary timer fire is taxed as ~200 µs of CPU even if the handler does nothing. A poorly-tuned Timer firing 100×/sec contributes ~20 ms/sec of “phantom CPU” to the score.

What’s not measured: display backlight, radio (Wi-Fi/Bluetooth) power, ANE on most charts. Activity Monitor’s Energy tab is useful but not complete for an LLM app — supplement with powermetrics.

Part 4 — Measurement tools

Tool	What it shows
`sudo powermetrics --samplers cpu_power,gpu_power,ane_power,thermal,smc -i 1000`	Per-cluster power in mW, P/E core residencies and frequencies, GPU power, ANE power, fan RPM, instantaneous thermal pressure
`sudo powermetrics --samplers tasks --show-process-energy --show-process-qos -i 1000`	Per-process energy impact, QoS breakdown of CPU time
`sudo powermetrics --samplers battery -i 1000`	Discharge rate (mW), capacity, cycle count
`sudo powermetrics -f plist -i 1000 -n 10 -o run.plist`	Machine-readable plist for scripted analysis
`pmset -g log`	Historical sleep/wake/dark-wake events
`pmset -g assertions`	Snapshot of who currently holds power assertions
`pmset -g assertionslog`	Assertion create/release history (leak-hunting)
`pmset -g stats`	Sleep/wake counts since boot
`sysdiagnose` (Cmd-Opt-Shift-Ctrl-`.`)	System snapshot bundle: powerlogs, spindumps, jetsam reports, energy logs, plist tree
Xcode Energy gauge (Debug Navigator → Energy Impact)	Per-second Energy Impact during a debug run
Instruments Energy Log template	CPU / GPU / Network / Display / Location overlaid with system events
Console.app with `subsystem:com.apple.powerd OR com.apple.thermalmonitord OR com.apple.runningboard`	The narrative log of why your app got napped/throttled
`ioreg -r -d 1 -c AppleSmartBattery`	Battery health (`MaxCapacity`, `DesignCapacity`, `CycleCount`)
`sysctl -a	grep -i therm`
MetricKit (`MXMetricManager`)	Per-day aggregated CPU, energy, memory, launch, animation metrics from real users in the field

For sustained-inference work on a MacBook Air, the canonical diagnostic loop is:

sudo powermetrics -i 500 --samplers cpu_power,gpu_power,ane_power,thermal --show-process-qos | tee run.log
Run the inference.
Watch (a) total package power in W, (b) thermal pressure transitioning Nominal→Moderate→Heavy, (c) CPU frequency dropping under throttle, (d) per-process QoS — confirm your worker threads landed at the QoS you expected.

Part 5 — Optimization recipes (per scenario)

5.1 Polling → events

The cardinal sin. Per Apple’s Energy Efficiency Guide: “Some apps use timers to poll for state changes when they should respond to events instead.” Replacements:

File-system change: FSEventStream (high-level) or DispatchSource.makeFileSystemObjectSource (lower-level, per-file).
Process exit: DispatchSource.makeProcessSource(eventMask: .exit).
Generic FD readiness: DispatchSource.makeReadSource / makeWriteSource (wraps kqueue).
Memory pressure: DispatchSource.makeMemoryPressureSource (covered in macos-memory-management.md).
Network reachability: nw_path_monitor, not polling a TCP connect.
Combine / async-stream subscriptions where applicable.

Quinn’s framing: every poll cycle pays the “wake-up tax” — context switch, cache trash, possibly P-core wake, possibly Wi-Fi radio wake. The cumulative top -stats power POWER score is dominated by this for badly-behaved apps.

5.2 Timers with leeway

If you genuinely need a timer, use DispatchSourceTimer with a meaningful leeway:

let timer = DispatchSource.makeTimerSource(queue: .global(qos: .utility))
timer.schedule(deadline: .now() + 30, repeating: 30, leeway: .seconds(5))
timer.setEventHandler { ... }
timer.resume()

The 5-second leeway lets the kernel coalesce your fire with other near-time events; the scheduler “lumps” wake-ups into bursts to minimize the number of times the SoC has to exit deep idle. This is the Mavericks-introduced Timer Coalescing, still active and amplified by Apple Silicon’s deeper idle states. [Mike Ash, Friday Q&A 2009-09-11: Intro to GCD, Part III: Dispatch Sources; Apple, Power Efficiency in OS X Technology Overview, 2013]

Avoid Foundation.Timer for repeated wall-clock work; it’s less coalescing-friendly.

5.3 Display refresh sync

UIKit / iOS / Mac Catalyst: CADisplayLink — fires once per display refresh; pause it when you have nothing to draw.
AppKit, macOS 14+: NSView.displayLink(target:selector:), NSWindow.displayLink(...), NSScreen.displayLink(...). These replace CVDisplayLink, which is deprecated in macOS 15. The new APIs are display-aware (correct frequency for ProMotion external displays) and reach you on the main run loop. [Apple AppKit release notes, macOS 14/15]
AppKit pre-14: CVDisplayLink — still works, schedule it on a background thread, dispatch to main for drawing.

The energy principle: never redraw if your content hasn’t changed. Coalesce dirty rects (setNeedsDisplay(_:) with the smallest rect). Pause the display link when offscreen / in background. Drawing at 120 Hz on a ProMotion display when you have no animation is roughly 2× the GPU energy vs 60 Hz for zero perceptual benefit.

5.4 GPU underclocking when idle

Metal’s GPU clocks itself opportunistically. Things that keep the GPU spun up unnecessarily:

Submitting empty or near-empty MTLCommandBuffers every frame.
Holding a long-lived MTLRenderCommandEncoder open while waiting for CPU work.
presentDrawable(_:) without presentDrawable(_:after:) when you know the next stable presentation time — Metal then can’t power-gate effectively.

Move to render on demand: only submit a command buffer when a content delta requires it. For LLM inference on the GPU (MLX, llama.cpp Metal), don’t dispatch empty kernels between tokens — schedule the next decode kernel only when input is ready.

5.5 NSBackgroundActivityScheduler

The right primitive for “opportunistic” work — opportunistic by Apple’s definition, which considers power source, thermal pressure, idle status, and time-of-day.

let activity = NSBackgroundActivityScheduler(identifier: "co.locara.index-rebuild")
activity.repeats = true
activity.interval = 6 * 60 * 60        // every 6 hours, approximately
activity.tolerance = 30 * 60           // ±30 min flexibility
activity.qualityOfService = .utility
activity.schedule { completion in
    // do the work…
    completion(.finished)
}

Under the hood this is a wrapper around the XPC Activity API (xpc_activity_register). It internally calls beginActivityWithOptions:reason: with appropriate options derived from your QoS. The scheduler dives into the Duet Activity Scheduler (DAS), which scores candidate activities and dispatches them via Centralized Task Scheduling (CTS) when conditions are favorable. Practitioner observation: in the wild, NSBackgroundActivityScheduler often only fires on AC power by default — design for that. [Eclectic Light, How macOS runs background activities series]

5.6 AC-vs-battery decisions

The pattern most LLM apps should adopt:

func currentPowerSource() -> PowerSource {
    let info = IOPSCopyPowerSourcesInfo().takeRetainedValue()
    let sources = IOPSCopyPowerSourcesList(info).takeRetainedValue() as [CFTypeRef]
    for src in sources {
        if let desc = IOPSGetPowerSourceDescription(info, src)?.takeUnretainedValue() as? [String: Any],
           let state = desc[kIOPSPowerSourceStateKey] as? String,
           state == kIOPSACPowerValue { return .ac }
    }
    return .battery
}

Then gate behavior:

On battery: pause prefetch, drop max_tokens by ~30-50%, disable speculative decoding, reduce KV-cache prefill batch size, run smaller draft model variant.
On AC: full-fat path.
If isLowPowerModeEnabled regardless of source: treat as “battery × 2” — even more aggressive.

Frumusanu’s AnandTech analyses showed M-series performance is largely the same on battery vs AC (unlike Intel chips that limit themselves heavily on battery — part of why Apple Silicon felt revolutionary on launch), but the user’s expectation of battery hours doesn’t change, so respect the unplugged state regardless of headroom.

5.7 Sustained inference on a MacBook Air — the passive-cooling cliff

The MacBook Air is fanless. Heatsink is the aluminum chassis. Continuous full-SoC load (≥ 20-25 W on a Pro variant; ~15-20 W on a base Air) saturates the passive thermal path in 30-90 seconds, after which CLPC starts clamping frequencies. Community measurement (AlexZiskind, SolidAITech, others) for sustained Ollama inference on M2/M3 Air:

M2 Air: surface 55-57 °C, throttle begins ~10-15 min into sustained generation; tokens/sec drop 20-40%.
M3 Air: stays ≤ 51 °C longer; sustained near-peak tokens/sec for ~30 min before drop.

Defensive pattern in your app:

Subscribe to thermalStateDidChangeNotification.
On .fair (or Darwin Heavy): pause non-inference background work (indexing, prefetch).
On .serious: drop max_tokens, disable speculative decoding, reduce concurrency to 1.
On .critical: stop generation gracefully, surface a one-line user-visible hint (“Your Mac is hot — pausing to cool down”).

Numbers from third-party measurements; treat as approximate.

5.8 Network-induced energy

Radio-on is expensive: the Wi-Fi PHY can pull 0.5-1 W active, dropping to a few mW in low-power listen between bursts. Every poll keeps it active. Patterns:

Use URLSession with waitsForConnectivity = true rather than retrying connect-fail loops.
Use URLSession’s background configuration (URLSessionConfiguration.background(withIdentifier:)) for non-interactive transfers — the system batches them out-of-process.
Set URLRequest.allowsExpensiveNetworkAccess = false and allowsConstrainedNetworkAccess = false to opt out of cellular/tether/LPM-network when you can.
Prefer HTTP/2 multiplexing over multiple parallel HTTP/1.1 connections.
For server-pushed data, use a single long-lived connection (SSE, WebSocket with heartbeat coalescing), not polling.

5.9 Location services

Significant battery hit if CLLocationManager.startUpdatingLocation() runs continuously. Use the lighter variants:

startMonitoringSignificantLocationChanges() — wakes only on cell-tower hops (~500 m).
requestLocation() — one-shot fix, then stop.
CLLocationAccuracy.kCLLocationAccuracyThreeKilometers if exact location isn’t needed.

On macOS, usually irrelevant for LLM apps, but mentioned because the audit (Energy tab > Network/Location columns) will flag it if accidentally enabled.

Part 6 — QoS / P-E-core story (deep)

The XNU clutch scheduler (file: osfmk/kern/sched_clutch.c; design doc: apple-oss-distributions/xnu/doc/scheduler/sched_clutch.md) replaces the older priority-band scheduler. The Edge extension (doc/scheduler/sched_clutch_edge.md) handles AMP (P+E) topology.

Three-level hierarchy:

Scheduling bucket — coarse class, roughly QoS-mapped: FIXPRI (AboveUI) | FG (Foreground) | IN (Interactive) | DF (Default) | UT (Utility) | BG (Background). Per-bucket worst-case latency targets (0, 0, 0, ~37 ms, ~75 ms, ~250 ms in the source comments — values vary by release).
Thread group — a workload, sharing a cluster preference. The CLPC/performance-controller maintains the recommendation policy: which cluster a thread group should run on.
Thread — within a thread group, RR/timeshare with bucket-specific quantums.

When a thread becomes runnable, Edge:

Picks the recommended cluster (per the performance controller).
If that cluster is idle or running lower-priority work, run there.
Otherwise compute a scheduling-latency delta vs other clusters; migrate if the delta exceeds the edge-weight between clusters (homogeneous neighbors are preferred over asymmetric ones).
Idle CPUs steal work in 4 steps: local cluster foreign threads → native cluster (if multi-cluster of same type) → foreign running threads via IPI → finally asymmetric steal across P/E boundary.

The practical contract for app developers:

QOS_CLASS_BACKGROUND is sticky to E-cores. Use it freely for indexing, embedding precomputation, log compaction.
QOS_CLASS_UTILITY is also E-core-preferred but may overflow to P. Good for “I want this done but the user can wait minutes.”
QOS_CLASS_USER_INITIATED lands on P-cores. Use for “the user is staring at a spinner.”
QOS_CLASS_USER_INTERACTIVE is for the run loop and frame rendering. Don’t route batch work through it.

The single most expensive mistake: marking inference threads userInteractive because “the user is waiting.” The user is waiting at the spinner, not at the keyboard — userInitiated is correct. The wrong choice pegs P-cores at max V/f, wasting roughly the ratio (P_max V² f) / (P_typical V² f), which is ~3-5× on M3.

QoS inheritance:

dispatch_async propagates QoS from the creating queue/task to the work item by default. The override is DISPATCH_BLOCK_ENFORCE_QOS_CLASS.
Swift Concurrency Task { ... } inherits priority from the current task; Task.detached { ... } does not (it gets default unless you pass priority:).
Raw pthread_create does not inherit QoS; you must pthread_attr_set_qos_class_np before pthread_create, or call pthread_set_qos_class_self_np from the thread on entry.
NSOperation inherits via OperationQueue.qualityOfService; per-op override via Operation.qualityOfService.

The Edge scheduler honors all of this. The CLPC frequency curve responds — a cluster full of BG threads sits at minimum E-frequency (~1020 MHz, M3); a P-cluster with one userInitiated thread runs at near-max (~4 GHz+ on M3, decreasing in steps as more cores in the same cluster wake).

Part 7 — Thermal management — practitioner recipes

Per-chip-family behavior under sustained load

Mac	Cooling	Sustainable SoC power	Time-to-throttle (typical)
MacBook Air (M1/M2/M3/M4)	Passive (chassis)	~10-12 W	30-90 s heavy CPU+GPU, ~10-15 min steady inference
MacBook Pro 14” / 16” (Pro / Max)	Active (twin fans)	30-90 W depending on SKU	Indefinite at “Auto”; throttles only above design TDP
Mac mini (M-series)	Active, single fan	20-40 W	Indefinite for base; Pro throttles less than Air
Mac Studio (Max/Ultra)	Active, high-volume	60-200 W	Indefinite for all but adversarial loads
Mac Pro	Active, oversized	full chip TDP	Indefinite

Numbers are approximate community measurements; specific SKU + ambient + macOS version varies them.

Defensive code template

final class ThermalGovernor {
    private var notifyToken: Int32 = 0

    init() {
        let center = NotificationCenter.default
        center.addObserver(self,
                           selector: #selector(thermalChanged),
                           name: ProcessInfo.thermalStateDidChangeNotification,
                           object: nil)
        // Optional: subscribe to fine-grained Darwin notification for Moderate/Heavy split
        notify_register_check(kOSThermalNotificationPressureLevelName, &notifyToken)
    }

    @objc private func thermalChanged() {
        switch ProcessInfo.processInfo.thermalState {
        case .nominal:
            InferenceConfig.shared.restoreDefaults()
        case .fair:
            InferenceConfig.shared.maxTokens = min(InferenceConfig.shared.maxTokens, 1024)
            BackgroundScheduler.shared.pauseDiscretionary()
        case .serious:
            InferenceConfig.shared.maxTokens = 512
            InferenceConfig.shared.speculativeDecoding = false
            InferenceConfig.shared.prefetchEnabled = false
        case .critical:
            InferenceManager.shared.gracefullyStop()
            UIBus.shared.show(banner: "Pausing inference — your Mac is heating up")
        @unknown default: break
        }
    }
}

Run this once, at app launch, on the main actor. Never poll thermalState; subscribe.

Part 8 — Production telemetry: MetricKit

MetricKit (Foundation framework, MetricKit umbrella, available on macOS 12+ for many metrics) is Apple’s blessed channel for getting real-user energy/performance data into your backend.

import MetricKit

final class MetricsObserver: NSObject, MXMetricManagerSubscriber {
    func enable() { MXMetricManager.shared.add(self) }

    func didReceive(_ payloads: [MXMetricPayload]) {
        for payload in payloads {
            let cpu = payload.cpuMetrics
            let energy = payload.energyMetrics?.cumulativeForegroundElectricEnergyUsage
            // ship to backend
        }
    }
    func didReceive(_ payloads: [MXDiagnosticPayload]) { /* hangs, crashes */ }
}

What you get in a daily payload:

MXCPUMetric: cumulativeCPUTime, cumulativeCPUInstructions (the latter added later, mind your minimum-deployment).
MXEnergyMetric: cumulative foreground/background electric energy usage in kWh (yes, kilowatt-hours — multiply out from microscopic per-app numbers).
MXMemoryMetric: peakMemoryUsage, averageSuspendedMemory.
MXAnimationMetric: scroll-hitch ratio.
MXAppLaunchMetric: launch time histogram.
MXDiskIOMetric, MXNetworkTransferMetric, MXCellularConditionMetric (iOS).

Aggregation is 24-hour (very approximately — the OS may deliver more or less frequently in edge cases; don’t rely on schedule). Delivery is opportunistic — typically once on app foregrounding after midnight.

Why MetricKit is the right answer for “how does my app actually behave on a 16 GB Air at 78% battery in Tokyo”: you cannot get it from your laptop. Apple aggregates, anonymizes (the metrics are device-private; you only see your app), and respects user energy/data settings. The alternative (rolling your own power telemetry) is invasive and inaccurate.

Part 9 — The “legends say” synthesis

Cross-cutting principles, with the legend in parens:

(Dally) Communication is the power problem, not compute. A 32-bit operand moved from DRAM costs ~640 pJ; an integer add costs ~32 fJ. Optimize for locality first, parallelism second.
(Drepper) Cache misses are 10-100× more expensive than hits. The cache-friendly version of your kernel is the battery-friendly version.
(Hennessy & Patterson) Dennard scaling is over; the wins now come from heterogeneity (P/E cores, ANE, GPU) and domain-specific acceleration. Use the right accelerator for the workload — don’t run a matmul on the CPU when the GPU or ANE is idle.
(Apple’s WWDC throughline, 2013-2025) Respect the system signals. thermalState, isLowPowerModeEnabled, AC vs battery, beginActivity reasons — they exist so you don’t have to guess. Apps that try to “outsmart” the scheduler usually waste energy and degrade the OS’s overall behavior.
(Quinn “The Eskimo!”) Don’t override system behaviors unless you have a specific reason. App Nap is good for the user. Power assertions leak; balance every create with a release in atexit. If you reach for IOPMAssertion* first, you’ve probably skipped NSProcessInfo.beginActivity, which composes better with the rest of the energy stack.
(Mike Ash + Habouzit) GCD timers want leeway; the kernel coalesces. Async for ms-scale work is a tax, not a win. Don’t proliferate queues. Locks beat semaphores for QoS propagation.
(Jeff Johnson) Practitioner discipline: hold the activity token strong; choose the NSActivityOptions flag that matches what you actually need; one wrong default and you’ve given the OS permission to sleep mid-decode.
(Howard Oakley / Eclectic Light) Read the logs. RunningBoard (“Battlecruiser operational” / “Death sentinel fired!”), powerd, thermalmonitord. The OS already told you why it suspended you; you just have to grep for it.
(Frumusanu / AnandTech) Apple Silicon’s win is the energy curve, not the peak. P-cores at 80% frequency cost a fraction of P-cores at 100%; E-cores at any frequency cost less than a P-core at any frequency. Picking the right cluster (via QoS) is the single biggest lever you have.
(XNU source) When the documentation contradicts the source, the source is right. sched_clutch_edge.md is the truth about how AMP scheduling actually picks clusters; the WWDC summary is the brand-safe paraphrase.

Specific learnings for Locara

The runtime subscribes to thermal + power + memory pressure on behalf of every app. Apps run inside the runtime; if pressure rises (memory, thermal, or low-power-mode), the runtime can pause prefetch, drop max_tokens, disable speculative decoding, evict secondary models, and notify the app via a capability — instead of every app reinventing this. Per mac-app-store-sandbox.md’s entitlement model, the runtime owns system signals; the app receives normalized events.
Inference threads default to userInitiated, not userInteractive. Static lint at build-time. The single most expensive QoS mistake (3-5× energy on M-series).
Hold an NSProcessInfo.beginActivity token for the inference window, with options [.userInitiated, .idleSystemSleepDisabled]. Strong reference (Jeff Johnson’s warning). Release on completion.
No raw IOPMAssertion* calls in app code. Runtime offers a Locara.preventDisplaySleep() capability that wraps it with leak-proof teardown. Apps that try to leak assertions get caught at review time.
Default to no-poll patterns. Locara SDK exposes only event-based APIs (watch(_:) for files, subscribe(_:) for state, MemoryPressureObservable for memory). No Timer.scheduledTimer is exposed; if you want one, you have to import Foundation directly and the lint catches it.
AC-vs-battery is a first-class manifest decision. Apps declare their behavior matrix in locara.json (onBattery.maxTokens, onAC.maxTokens, etc.). The runtime enforces. User toggle to override.
isLowPowerModeEnabled is treated as “battery × 2”. Even more aggressive throttling. The Locara runtime communicates this state to apps through a normalized “performance budget” enum (full | reduced | minimal), not the raw boolean.
The thermal state machine is a runtime concern, not an app concern. The runtime subscribes to thermalStateDidChangeNotification, applies the defensive ladder per app’s declared profile, and surfaces a UI banner via a system component when state reaches .critical. Apps that need finer control can opt in to raw thermalState events.
Ship MetricKit telemetry from v1. Per-day aggregated CPU, energy, memory, launch, animation. Locara’s runtime opts into MetricKit on behalf of consenting users; metrics ship to a Locara-owned endpoint (anonymized, aggregated). This is the only way to know how Locara apps actually behave in the field.
The diagnostic story uses powermetrics + pmset + RunningBoard logs. The Locara CLI’s locara diagnose should bundle relevant powermetrics -i 500 -n 60 output plus pmset -g assertions, pmset -g log | tail -200, and log show --predicate 'subsystem == "com.apple.runningboard"' --last 1h for support. This gives Locara reviewers the same data Apple DTS asks for.

References

Apple official:

Energy Efficiency Guide for Mac Apps — developer.apple.com/library/archive/documentation/Performance/Conceptual/power_efficiency_guidelines_osx/
Respond to Thermal State Changes — same library, RespondToThermalStateChanges.html
Prioritize Work at the App Level — same library, PrioritizeWorkAtTheAppLevel.html
IOPMLib.h assertion types — developer.apple.com/documentation/iokit/iopmlib_h/iopmassertiontypes
WWDC 2013 #209 Improving Power Efficiency with App Nap
WWDC 2014 #710 Writing Energy Efficient Code, Part 1
WWDC 2017 #238 Writing Energy Efficient Apps (Daniel Schucker, Prajakta Karandikar)
WWDC 2017 #706 Modernizing Grand Central Dispatch Usage (Pierre Habouzit)
WWDC 2020 #10686 Explore the new system architecture of Apple silicon Macs
WWDC 2021 #10254 Tune CPU job scheduling for Apple silicon Macs (Tech Talk #110147 carries equivalent material for games)
WWDC 2022 Squeeze the most out of Apple GPUs with Metal performance counters
XNU source: apple-oss-distributions/xnu, especially osfmk/kern/sched_clutch.c, osfmk/kern/sched_clutch.h, doc/scheduler/sched_clutch.md, doc/scheduler/sched_clutch_edge.md

Practitioners (legends category):

Jeff Johnson — lapcatsoftware.com/articles/prevent-app-nap.html
Howard Oakley (Eclectic Light Co.) — relevant series:
- “RunningBoard” tag — eclecticlight.co/tag/runningboard/
- “How RunningBoard tracks every app, and manages some” (2019)
- “App Nap, undead and nascent apps in Ventura” (2023)
- “Power on Tap: Dynamic control of P cores in M1 chips” (2022)
- “Apple silicon: 1 Cores, clusters and performance” (2024)
- “Apple silicon: 2 Power and thermal glory” (2024)
- “What are CPU core frequencies in Apple silicon Macs?” (2025)
- “Making Apple silicon faster: 1 Threads and tasks” (2024)
Mike Ash — mikeash.com/pyblog/friday-qa-2009-09-11-intro-to-grand-central-dispatch-part-iii-dispatch-sources.html
Thomas Clement (capturing Habouzit) — gist.github.com/tclementdev/6af616354912b0347cdf6db159c37057
Nicholas Nethercote — blog.mozilla.org/nnethercote/2015/08/26/what-does-the-os-x-activity-monitors-energy-impact-actually-measure/
Dave MacLachlan — dmaclach.medium.com/thermals-and-macos-c0db81062889
Stanislas — stanislas.blog/2025/12/macos-thermal-throttling-app/ (Darwin notification recipe)
Quinn “The Eskimo!” — Apple Developer Forums; relevant threads: 679178 (background threads), 15736 (Power Nap detection), 71829 (debugging background tasks), 85474 & 118867 (memory pressure context that informs his power answers), 106855 (_NSActivityAssertion leaks)

Academic / industry foundations:

Ulrich Drepper — What Every Programmer Should Know About Memory (LWN 2007), full PDF: people.freebsd.org/~lstewart/articles/cpumemory.pdf
Bill Dally — Energy Efficiency and AI Hardware (Stanford AHA Retreat, 2023); High-Performance Hardware for Machine Learning (NIPS 2015 tutorial); Hot Chips 2023 keynote.
Hennessy & Patterson — Computer Architecture: A Quantitative Approach (6th ed.); A New Golden Age for Computer Architecture (CACM 62.2, 2019)
AnandTech archive (closed Aug 2024) — Andrei Frumusanu’s M1, M1 Pro/Max, M2 reviews; anandtech.com/show/16252/mac-mini-apple-m1-tested and /show/17024/apple-m1-max-performance-review

Caveats and contested numbers:

Activity Monitor Energy Impact coefficients are per-Mac-model and may have changed since Nethercote’s 2015 reverse-engineering. The shape (CPU + idle-wakes + GPU + I/O linear combination) is stable; the exact weights aren’t.
Per-cluster wattages for Apple Silicon are based on Oakley’s powermetrics measurements and AnandTech’s external measurements; powermetrics itself documents that “average power values reported are estimated and may be inaccurate — should not be used for comparison between devices.”
WWDC21 #10254 session number: The session Tune CPU job scheduling for Apple silicon Macs is referenced in several writeups, but Apple’s WWDC video catalogue can return the Swift Concurrency session for that ID in some queries — the Tech Talk #110147 (post-WWDC update for games) carries equivalent content with full transcript.
isLowPowerModeEnabled on macOS: shipped in macOS 12 (Monterey) for select MacBook Pros; broadened in Ventura/Sonoma to all portable Macs and added the AC-side toggle in Sonoma+. iOS shipped LPM much earlier (iOS 9). The notification name on macOS (NSProcessInfoPowerStateDidChange) is the same as iOS.
Thermal pressure level mapping between Darwin’s 5 levels and ProcessInfo’s 4 has been observed to drift across macOS versions; the “Heavy → fair vs serious” boundary specifically is not stable.
MetricKit cumulativeCPUInstructions was added later than the base CPU metric — check minimum-deployment.
Daniel Drepper’s specific energy numbers are not in his 2007 paper directly; the cache-miss-vs-hit energy ratios cited here are extrapolated from his cycle-cost numbers + general DRAM-energy figures from the literature. Treat as “order-of-magnitude” rather than exact.