07 — Runtime

The runtime is the part that executes user-installed apps on the user’s Mac. In Locara’s architecture, the runtime is not a separate process. It’s the Locara framework code (Rust crates + TS SDK) statically linked into each app at build time. Each Locara app runs as its own standalone Mac app, with the runtime inside it.

This document covers what that runtime does, how the app lifecycle works, and how capabilities are enforced at runtime.

Components inside each app

Transcribe.app                    (the user installs this)
  └── Tauri webview + frontend bundle
  └── Linked Rust backend (the "runtime"):
      ├── locara-runtime           # Tauri plugin orchestration, IPC
      ├── locara-core              # capability enforcement, inference
      ├── locara-storage           # SQLite + sqlite-vec
      ├── locara-models            # content-addressed fetch + cache
      └── locara-tools             # Wasmtime + WASI

When the user opens Transcribe.app, all of this lives inside one process. There’s no locara-shell external host; Tauri’s standard build embeds everything.

When the user opens DocVault.app at the same time, it’s a second independent process with its own copy of the linked framework code. The two processes don’t communicate except via the shared model cache directory and any explicitly declared shared-folder IPC (see 03-capabilities.md).

App lifecycle

Install

1. User downloads Transcribe-0.1.0.dmg from locara.app
2. Opens DMG, drags Transcribe.app to /Applications
3. macOS Gatekeeper verifies:
   - Locara CI's Apple Developer ID signature
   - Apple's notarization ticket (stapled to the bundle)
4. App is registered with Launch Services
5. Visible in Spotlight, Dock, Launchpad

No “install Locara client” pre-step. No package manager needed. Standard Mac app install.

First launch

1. User double-clicks Transcribe.app
2. macOS starts the app process
3. macOS allocates App Sandbox container at ~/Library/Containers/<bundle-id>/
   (entitlements were set at build time from the app's manifest)
4. App reads its embedded locara.json
5. Locara runtime initializes:
   - Registers Tauri commands for declared capabilities only
   - Sets up SQLite at ~/Library/Containers/<bundle-id>/Data/Documents/app.sqlite
   - Verifies all declared model hashes are present in shared cache (or queues fetches)
6. First-run consent UI appears (capability summary + first-run install)
7. User accepts; app navigates to its main UI

Normal launch

1. User opens Transcribe.app (Dock, Spotlight, etc.)
2. macOS starts the process
3. Tauri webview opens to the app's main route
4. Models load lazily on first capability use (or eagerly if manifest declares)
5. App is ready

Time from click to interactive: target < 1.5s on M2-class hardware (see 21-performance-budgets.md).

Update

1. App's Tauri updater plugin checks Locara registry's manifest API on launch (or daily)
2. If new version available:
   a. Compare manifest capabilities (old vs new)
   b. Determine update class:
      - Patch / minor without capability change → background-download, prompt to restart
      - Capability expansion → cool-down period (see 14-trust-safety.md)
3. New .app bundle downloaded, signature + provenance verified
4. On user-triggered restart:
   - Stop running app instance
   - Run any DB migrations (transactional, with backup)
   - Swap app bundle (new .app replaces old in /Applications)
   - Relaunch
5. If migration fails: roll back DB to backup, refuse update, surface error

Each app updates itself. There’s no central “update everything” client — Tauri’s per-app updater handles it.

A future optional Locara Manager menubar utility (phase 3+) could surface “all your Locara apps with available updates” in one place, but isn’t required.

Uninstall

1. User drags Transcribe.app from /Applications to Trash
2. macOS removes the bundle
3. App data in ~/Library/Containers/<bundle-id>/ persists by default (in case of reinstall)
4. User can manually clean up via the app's "Reset" feature, or by removing the Container directory
5. Models in shared cache (~/Library/Caches/Locara/models/) stay; refcount-zero models eligible for cleanup after 30 days

Standard Mac app uninstall behavior. No special tooling needed.

Capability enforcement at runtime

Every Locara primitive call goes through this check (inside the app’s own process):

// Pseudocode
fn handle_invoke(command: &str, args: Args) -> Result<Value> {
    let manifest = APP_MANIFEST.get();  // baked into the app at build time
    
    // 1. Map command to capability
    let required_cap = capability_for_command(command)?;
    
    // 2. Check manifest declares it
    if !manifest.capabilities.has(required_cap) {
        return Err(CapabilityDenied(required_cap));
    }
    
    // 3. Check scope on argument
    match command {
        "fs.read" => check_path_in_scope(&args.path, &manifest.fs_scope)?,
        "llm.chat" => check_model_declared(&args.model, &manifest.models)?,
        // ...
    }
    
    // 4. Execute
    dispatch(command, args)
}

Plus macOS App Sandbox doing kernel-level enforcement for filesystem / network / device based on entitlements baked into the app at build time.

Two layers, both inside or below the app process. No network roundtrip, no daemon to coordinate with.

Model management

The runtime maintains a shared model cache at ~/Library/Caches/Locara/models/ (note: shared directory across Locara apps, not a shared process):

models/
├── manifest.json                              # cache index
├── whisper-large-v3-q4-abc123.../             # model dir, named by hash prefix
│   ├── model.gguf
│   └── config.json
├── qwen2.5-3b-instruct-q4-def456.../
│   ├── model.gguf
│   └── tokenizer.json
└── ...

Each app’s bundle has hardlinks pointing into the cache. Multiple apps using the same model → one disk copy.

Loading a model

Check if already mmap-loaded in current process (warm).
If not: open the cache file, mmap it, hand to llama.cpp / MLX.
macOS’s mmap dedup means two processes mapping the same file share physical pages — RAM is partially shared at the OS level for free.

This means even without a shared-runtime daemon, two apps using the same model don’t fully duplicate RAM. The savings depend on what’s actually being read (mmap pages are loaded on demand).

Memory budget enforcement

Each app’s manifest declares a profile (low, mid, high) → maps to a soft RAM budget (see 32-resource-policy.md).
The runtime tracks loaded model size + active inference state.
Approaching budget → unload least-recently-used model first (in this app).
Exceeding budget → fail load with ResourceNotAvailableError.

Each app manages its own budget. macOS’s memory-pressure signals tell apps when to back off; apps respond independently.

Inference backends

Per ADR 0002, v1 uses llama.cpp via Rust binding as the only inference backend. MLX joins in v2 as an Apple-Silicon acceleration option for models with MLX variants.

Routing logic (in locara-core):

v1: llama.cpp for everything.
v2+: if cpu_type == "Apple Silicon" && model has MLX variant → use MLX, else llama.cpp.

Shared model daemon (`locara-modeld`)

Per ADR 0015, apps may optionally route inference through a per-user background daemon (locara-modeld) instead of loading models into their own address space. The daemon’s job is to make “many Locara apps on one Mac” physically possible by sharing the cost of a loaded model across every app that wants it.

Shape:

One process per user, managed by macOS launchd as a LaunchAgent.
One socket per user at $XDG_RUNTIME_DIR/locara-modeld.sock (Linux convention; honored on macOS too) or ~/Library/Caches/Locara/sockets/modeld.sock.
One process address space holds every loaded model. Sessions are reference-counted; a model evicts only when refcount reaches 0 AND an idle timer (default 300 s) elapses.
Capability check at the socket boundary. Each client handshake pins the caller’s signed manifest hash; the daemon refuses to load a model the manifest doesn’t declare.

Crates:

locara-modeld-protocol — wire types (msgpack-over-Unix-socket framing). Bumping the protocol version requires daemon restart; clients refuse a mismatch on connect.
locara-modeld — the daemon binary + library. serve(Config) -> (ShutdownHandle, JoinHandle) for in-process tests.
locara-modeld-client — connect, load_model, Session that releases on Drop. Surface mirrors InferenceBackend so apps can opt in via a single import swap.

Opt-in (v0): the daemon ships disabled by default. Apps opt in with LOCARA_USE_MODELD=1 or a runtime.backend: "daemon" field in locara.json. The runtime falls back to in-process loading when the daemon is unavailable.

Phase status (as of v0.0.4):

v0 — protocol + daemon shell + client + refcount/eviction semantics. Shipped. LoadModel validates the manifest hash and hands out a session token.
v1.0 — backend trait + StubBackend + Chat IPC + token streaming. Shipped. Daemon now serves real Chat requests through a pluggable Backend trait; the default stub backend echoes the user’s last message word-by-word so the full streaming round-trip is exercised in CI without depending on multi-GB model weights. Hosts inject LlamaBackend via the Config::backend_factory field (stub_backend_factory() is the default).
v1.1 — LlamaBackend adapter for real inference via locara-llama. Shipped. Gated behind --features llama so the default daemon build stays light. llama_backend_factory(cache_root) returns a factory hosts plug into Config::backend_factory; resolves model files from both content-addressed (blobs/<prefix>/<sha256>) and flat-cache (<sha256>.gguf, <model_id>.gguf) layouts. 8 unit tests for the adapter; 1 gated integration test (LOCARA_TEST_MODEL_PATH=/path/to/m.gguf) exercises a real chat round-trip when a local model is available.
v1.2 — wire transcribe’s first-run flow + runtime.backend: "daemon" opt-in. Adds runtime IPC commands models_resolve_or_request + model_download (foundation primitives locara-models::fetch_to_cache + ModelDownloader component both already exist). Needs real-Mac end-to-end validation.
v2 — tray UI: warm models, RAM consumed, current inferring app, capability log spanning all Locara apps.
v3 — launchd plist install + auto-start via locara modeld install.

ADR-0015 explains the trust model (the daemon is a Locara-signed system service, not a Locara app — distinct from the app-to-app communication banned by ADR-0006), the RAM math that motivates the work, and the alternatives considered.

IPC protocol (frontend ↔ backend, within one app)

Tauri’s standard invoke mechanism. The app’s frontend calls:

import { invoke } from '@tauri-apps/api/core'

const result = await invoke('locara_llm_chat', { ... })

@locara/sdk wraps this; apps don’t see Tauri-specific APIs.

The Rust side (linked into the app) registers commands:

#[tauri::command]
async fn locara_llm_chat(state: State<LocaraRuntime>, args: ChatArgs) -> Result<ChatResponse> {
    state.check_capability(Capability::LlmChat { model: args.model.clone() })?;
    state.llm.chat(args).await
}

Streaming uses Tauri’s event channel (see 33-streaming.md).

Sandboxing layers

┌────────────────────────────────────────────────┐
│  Frontend JS (untrusted)                       │
│  ↓ Tauri IPC (capability-gated by Locara)      │
├────────────────────────────────────────────────┤
│  Locara plugin commands (Rust, linked into app)│
│  ↓ Capability check + scope check              │
├────────────────────────────────────────────────┤
│  Inference / FS / Network primitives           │
│  ↓ macOS App Sandbox (kernel)                  │
├────────────────────────────────────────────────┤
│  System call                                   │
└────────────────────────────────────────────────┘

If Tauri IPC misses a check, the Locara plugin re-checks. If Locara misses, macOS denies at the kernel. Defense in depth, all within one app process.

For tool execution, an additional layer:

Locara tool invoke
  ↓
Wasmtime sandbox (no syscalls except declared WASI imports)
  ↓
Locara plugin returns result to frontend

See 10-tools.md.

Error handling

Errors that bubble to the frontend:

CapabilityDeniedError — manifest doesn’t declare it.
ResourceNotAvailableError — model can’t load (RAM, disk, hash mismatch).
ToolError — sandbox violation, timeout, trap.
StorageError — sqlite issue.
ModelLoadError — corrupted weights, unsupported format.

Crashes within an app are isolated to that app’s process; one app crashing doesn’t take down others.

Logging

Each app logs to its own location:

~/Library/Logs/Locara/<bundle-id>.log — per-app log.
Or the app’s own preferred location if it customizes.

Logs include: capability denials, model loads, errors. Rotated, capped at ~10 MB per app.

No telemetry to network. Logs stay on device. User can Console.app to inspect or share with developer if requested.

Background processing

Apps can declare background.mode in their manifest (see 36-notifications-background.md). This works at the OS level:

Apps that opt into background.mode: "active" keep their process alive when their window closes; show a status-bar item.
macOS handles process scheduling, throttling under low power, etc.

No Locara-level background coordinator needed. Each app is an independent process; macOS arbitrates.

Multi-app coordination

In v1, there’s intentionally minimal coordination between Locara apps:

Shared model cache (a directory on disk) lets apps reuse downloaded models.
Shared-folder IPC (declared in manifests) lets pairs of apps cooperate via the filesystem.
macOS handles everything else — process scheduling, memory pressure, focus, lifecycle.

In v2+, an optional shared-runtime daemon could:

Hot-swap loaded models across apps.
Arbitrate RAM budgets globally.
Speed up cold-start when an app’s model is already loaded by another.

This is a transparent optimization, not a requirement. Apps would behave identically; the daemon would just make heavy multi-app usage more efficient.

Locara CLI for developers (NOT shipped to users)

The locara CLI is a developer tool:

locara init, locara dev, locara test, locara verify, locara build, locara publish.
Does not ship to end users.
End users only ever see the apps, not Locara framework infrastructure.

See 06-cli.md.

Why this architecture works

Apps are first-class Mac apps. They appear in /Applications, Spotlight, Dock. Familiar UX.
No coordinator process to install, manage, or update. Each app is independent.
Trust mechanics are baked in. The capability enforcer is inside the app; macOS sandbox is around the app. Both layers run regardless of any external state.
Apps stand alone. If the Locara project disappears tomorrow, your .app bundles still work. No “the runtime is gone, now nothing runs” dependency.
Distribution is conventional. Users download a .dmg from a website. They’ve done this thousands of times.
Single Apple Developer ID for all apps (Locara’s). Publishers don’t need their own Apple Dev Program.

Cross-references

Architecture overview: 01-architecture.md
Capability model: 03-capabilities.md
Modalities + tooling: 04-modalities.md
SDK calls the runtime exposes: 05-sdk.md
Storage: 08-storage.md
Models: 09-models.md
Tools: 10-tools.md
Trust mechanics: 13-security-privacy.md, 14-trust-safety.md
Distribution: 15-distribution.md
Build pipeline: 16-build.md
Resource policy: 32-resource-policy.md
Streaming: 33-streaming.md
Tauri architecture notes: ../notes/tauri.md
ADR 0002 (llama.cpp v1): ../docs/adr/0002-llamacpp-v1-mlx-v2.md