Build a Locara app.
Read the research. Browse the catalogue.
The specification is the canonical, normative source of truth. ADRs capture architecture decisions. Everything else — guides, research notes, the modality catalogue — lives here.
Recommended path for new developers.
Read in order. Each guide unblocks the next. By the end you'll
have a manifest, a wired-up SDK, and a signed
.locapp sitting on disk.
- 1
- 2
Locara for AI coding agents
How to be effective when generating Locara apps as an LLM coding assistant.
Read → - 3
Every input/output a foundation model can do.
One file per transformation. Each lists representative open-weight models, the infrastructure Locara needs around them (inference / input / output / storage / interaction / capabilities), and what's shipped vs. missing.
- Classical NLP tasks (covered by chat LLM) The pre-LLM era of NLP: BERT-fine-tuned classifiers, NER taggers, extractive QA, NLI-based zero-shot, table-grounded QA, masked LM. Each … Blocked
- depth-estimation Image → per-pixel depth map. Useful for AR effects, photo relighting, 3D reconstruction stage, focus / blur effects. Removed
- document-question-answering PDF / scanned page + question → answer. Distinct from generic image-text-to-text because layout, tables, and stamps matter. Highly releva… Removed
- feature-extraction HuggingFace's umbrella name for "encoder-only model produces a vector". For text it's text-to-embedding; for images it's image-feature-ex…
- image-classification Image → label (or label distribution) from a fixed set. Distinct from zero-shot-image-classification which takes labels at runtime. Removed
- image-feature-extraction Image → fixed-size float vector. Used for visual search, deduplication, classification head training, downstream tasks. Removed
- image-segmentation Covers: HF's image-segmentation AND mask-generation. SAM 2 does both in a single model. Removed
- image-text-to-image Image + text instruction → edited image. The "make this person smile" / "remove the watermark" / "add a hat" task. Distinct from image-to… Removed
- image-text-to-text HF aliases: image-to-text, visual-question-answering. HuggingFace lists these separately because of historical model specialization (OCR-… Partial
- image-text-to-video Image + text prompt → video. Animate a still with an instruction (e.g. "make this person wave"). Distinct from image-to-video (no text in… Removed
- image-to-3d Single image (+ optional text) → 3D mesh. Most "text-to-3D" pipelines today actually go text → image → 3D, so this is the heavy-lift stage. Removed
- image-to-image (no text instruction) Image → image without a text prompt. Super-resolution, restoration, denoising, style transfer based on a reference image. Distinct from i… Removed
- image-to-text HuggingFace lists image-to-text separately because of historical caption-only models (BLIP, GIT). In modern practice the same VLMs handle…
- image-to-video Still image → animated video, no text instruction. Useful for animating photos / artwork. Distinct from image-text-to-video which takes a… Removed
- keypoint-detection Image / video → joint locations (pose). Hands, body, face landmarks. Useful for AR, fitness apps, sign-language recognition. Removed
- mask-generation HuggingFace lists mask-generation and image-segmentation as separate tasks. They merged in 2024 when SAM 2 unified them in one model: SAM…
- object-detection Covers: HF's object-detection AND zero-shot-object-detection. The two share infrastructure; the difference is whether labels come from a … Removed
- Out-of-scope modalities Recorded for completeness against the HuggingFace task taxonomy. These tasks either don't fit the "foundation model as runtime modality" … Blocked
- sentence-similarity The HF "sentence similarity" task is just "embed two sentences and compute cosine distance" — pure usage of text-to-embedding. No separat…
- speech-to-text (ASR) HF aliases: automatic-speech-recognition. Shipping
- summarization Long text → short text. Abstractive summarization. Partial
- text-generation HuggingFace's umbrella name for autoregressive language modeling. In Locara's framing this is just text-to-text.
- text-ranking (cross-encoder reranker) Take a query + a list of candidate documents, re-score each pair jointly. Cross-encoders beat bi-encoders on top-K re-ranking quality and… Removed
- text-to-3d Text → 3D mesh / Gaussian splat. Note that most "text-to-3D" pipelines today actually go text → image → 3D, so the heavy lift is in image… Removed
- text-to-audio (sound effects) Text → ambient sound, sound effects, foley. Distinct from text-to-music (rhythmic) and text-to-speech (linguistic). Removed
- text-to-code LLM specialized on code — completion, refactor, explain. Architecturally identical to text-to-text but with code-specific fine-tuning tha… Partial
- text-to-embedding HF aliases: feature-extraction, sentence-similarity. Shipping
- text-to-image Text → image. The classic diffusion task. Removed
- text-to-music Text → music. Adjacent to text-to-audio but with a different model class. Removed
- text-to-speech Text → speech audio. Partial
- text-to-text HF aliases: text-generation. Shipping
- text-to-text-thinking Same shape as text-to-text but the model emits explicit reasoning tokens before the final answer. Apps can display the reasoning trace se… Partial
- text-to-video Text → video clip. Computationally heavy. Removed
- time-series-forecasting Numeric series → future values. Foundation models for time series have matured fast (2024-26) and are increasingly LLM-shaped: tokenize v… Removed
- translation Text in language A → text in language B. Specialized translation models meaningfully outperform chat LLMs on low-resource languages. Partial
- video-classification Video → action / activity label. Removed
- video-text-to-text (video Q&A / description) Video (+ optional text question) → text. "What happens in this clip?", "When does the person sit down?". Removed
- video-to-video Video → transformed video (style transfer, frame interpolation, super-resolution, slow-mo). Removed
- visual-document-retrieval Search a corpus of documents by screenshot similarity, not OCR'd text. The query (text or image) and each page (as a rendered image) get … Removed
- visual-question-answering VQA is a sub-task of image-text-to-text — same VLM models, same infrastructure. HuggingFace lists it separately for historical reasons; f…
- voice-activity-detection Audio → time spans where speech occurs. The first stage of any robust speech pipeline — segments mic input into speech vs silence so down… Removed
- voice-to-voice (full duplex) Live mic in, audio + text out, full-duplex, sub-second latency. Either as a true end-to-end audio language model (Moshi-class) or as a ST… Partial
- zero-shot-image-classification Image → label from a runtime-supplied list (no fine-tuning). CLIP-style: encode image and labels, pick the closest. Critical for app auth… Removed
- zero-shot-object-detection HuggingFace lists this separately, but the open-vocabulary detectors (Grounding DINO, OWLv2) are listed in the object-detection model tab…
Working knowledge base.
Deep dives on companies, projects, patterns, and trade-offs
that informed each design decision. Mirrored from
/notes in the repository.
- A Brief History of Chips and Computers — From Bell Labs to LLMs What this is: The arc from the first transistor (1947) to today's local-AI hardware (2026), focused on the inflection points and the peop…
- AI Agent Marketplaces — 2026 Survey What this is: A comparative read of the curated marketplaces for AI agents, skills, and tools as of mid-2026. Eight competing platforms, …
- Apple Acceleration Frameworks — Accelerate, Metal, MPS, ANE, AMX What this is: A catalog-with-judgment for Apple's hardware-accelerated frameworks (Accelerate / vDSP / BLAS / BNNS / AMX, Metal compute, …
- Apple App Store (iOS) — Review Process What it is: The reference example of a curated, gatekept app marketplace. Hybrid automated + human review, enforced sandbox via entitleme…
- Apple Foundation Models Framework What it is: The Swift/Python developer SDK for Apple's on-device language model — the ~3B-parameter foundation model that powers Apple In…
- Auditing a Tauri Plugin for Security What it is: A repeatable methodology for deciding whether a third-party Tauri plugin is safe to depend on, vendor, fork, or replace with …
- Chip Fundamentals — How Silicon Computes (and Why LLMs Run the Way They Do) What this is: A primer on integrated circuits — from transistors and CMOS through architecture, memory hierarchy, and the LLM-specific bo…
- Chrome Web Store What it is: Marketplace for browser extensions, themes, and Chrome apps. Manifest-based capability declarations, automated + human review…
- Claude Agent SDK + Agent Skills What it is: Anthropic's agent-building stack. The Claude Agent SDK wraps Claude with built-in tool execution, file/shell access, and an a…
- Deno's Permission Model What it is: Deno's "secure by default" permission system. Code has zero capabilities until granted via CLI flags or runtime prompts. Gran…
- Design Tokens (Style Dictionary, DTCG, Tokens Studio) What it is: A formalization of design decisions as named, structured values — colors, spacing, typography, radii, shadows, motion — distr…
- Google Play What it is: Android's primary app marketplace and the most-permissive of the major mobile stores. Hybrid automated + human review, peer-g…
- gpt-engineer What it is: "Specify what you want to build, the AI asks for clarification, and then builds it." A CLI tool that takes a natural-language…
- Homebrew What it is: macOS / Linux package manager. Community-curated, Git-PR-driven. Formulae are Ruby files in a Git repo; new packages are acce…
- Hugging Face Hub What it is: The de facto registry for open-source ML — 2M+ models, 500k+ datasets, 1M+ Spaces (demo apps). Git-LFS-style storage backed b…
- Itch.io What it is: Indie-first marketplace for games and creative software. Pay-what-you-want, creator-set revenue split (default 90/10 in creat…
- Jan.ai What it is: Open-source desktop AI assistant — "a ChatGPT alternative that runs on your computer." Apache 2.0, built by janhq (Vietnam-ba…
- LangChain What it is: Python (and TypeScript) framework for building LLM-powered applications. Started 2022 as an opinionated abstraction over LLM …
- llama.cpp What it is: A C/C++ inference engine for transformer LLMs, originally a port of Llama by Georgi Gerganov ("ggerganov"). Now the dominant …
- llamafile What it is: A single executable file that bundles a model + llama.cpp + a tiny web UI, and runs on macOS / Linux / Windows / FreeBSD with…
- LLM Inference Frameworks — Survey for Mac Local-AI Apps What this is: A survey of the credible LLM inference engines and optimization packages in the universe, with a Mac-first lens. Per-engine…
- LLM Memory Math — Parameters, KV Cache, Bandwidth, and What Actually Fits What this is: A first-principles reference for translating "this model has X billion parameters at context length C" into hard numbers: G…
- LM Studio What it is: A polished proprietary desktop app for browsing, downloading, and running open LLMs locally, with an OpenAI-compatible local …
- Local voice-to-voice and unified-modality SLMs (research note, 2026-Q1) What this is: State of the art for on-device voice agents, focused on what Locara should add next after its existing text-LLM + embedding…
- Locara Components Library — Design Decisions and Provenance What this is: The design log for @locara/components — what we built, where each pattern came from, and why we made the architectural choi…
- Mac Hardware Lineup — Every Variant, RAM, Bandwidth, Model-Size Fit What this is: A reference inventory of every Mac model variant from the late-Intel era (2015) through M5 (early 2026), keyed on the numbe…
- Mac LLM Optimization — The Practical Playbook What this is: The hands-on optimization guide for running LLMs locally on Apple Silicon Macs. Specific flags, specific numbers, specific …
- macOS App Sandbox (and Mac App Store) What it is: Apple's kernel-enforced sandboxing system for macOS apps. Required for Mac App Store distribution; optional but recommended f…
- macOS Memory Management — Architecture and Optimization for Native Apps What this is: A reference for how macOS actually manages memory (VM subsystem, unified memory, pressure, compressor, swap, jetsam, wired …
- macOS Notarization + Sparkle What it is: The two-part operational reality of shipping a Mac app outside the App Store. Notarization is Apple's mandatory automated mal…
- macOS Performance Profiling — Tools, Methodology, Recipes What this is: A reference for the engineer who has just been handed a complaint of the form "your app makes my Mac slow." Maps symptoms →…
- macOS Power, Energy, and Thermal Optimization What this is: A reference on how power, energy, and thermal behavior work on macOS — from CMOS physics through DVFS, the P-core/E-core sp…
- MLX What it is: Apple's open-source array framework for machine learning on Apple Silicon. NumPy-like API in Python, Swift, and C++; lazy eva…
- Modalities & Models — Locara survey A research note for designing Locara's first-class modality catalogue. The canonical (terse, normative) list lives in spec/04-modalities.…
- Model Context Protocol (MCP) What it is: An open protocol introduced by Anthropic in November 2024 for connecting AI applications to data sources and tools through a …
- Modern Chip Landscape (early 2026) What this is: A snapshot of the chips currently shipping in mobile, laptop, desktop, and data-center systems as of early 2026. Not exhaus…
- MongoDB What it is: Document-oriented NoSQL database. Started 2009 (10gen, later renamed MongoDB Inc.). The canonical example of "developer-led, …
- npm (Node Package Manager Registry) What it is: The world's largest software registry by volume. Public, free to publish, ~3M packages. Owned by GitHub (Microsoft). Status: …
- Obsidian What it is: A note-taking and personal knowledge base app built around plain markdown files in a local "vault." Founded 2020. Privately h…
- Ollama What it is: A local LLM runtime + model registry. Daemon that exposes a REST API for running open models, with a Modelfile packaging form…
- Open WebUI What it is: Self-hosted ChatGPT-style web frontend for local LLMs. Originally "Ollama WebUI," renamed Open WebUI as it added support for …
- Open-Source vs Closed LLMs — Trends, Capability, and the Size-to-Quality Curve What this is: The state of open-weights LLMs vs. closed/API-only LLMs as of early 2026 — capability gaps, training-data and licensing dif…
- Packaging scope — what exists, what's missing, what to do next We've shipped a lot of infrastructure that's invisible to end users:
- Pieter Levels (@levelsio) Who: Solo Dutch indie maker, founder of Nomad List, Remote OK, Photo AI, Interior AI, and ~10+ other products. Probably the best-document…
- Pinokio What it is: A "localhost platform" / browser-style app for installing and running open-source AI applications locally with one click. Sta…
- Radix UI + Headless Component Primitives What it is: A class of component libraries that ship behavior without styling — accessibility, keyboard handling, ARIA, focus management,…
- Raycast What it is: A Mac-first command bar / app launcher, originally a Spotlight replacement, now a productivity platform with a curated extens…
- Sandstorm.io What it is: A self-hosted personal-server platform from 2014–2017, launched by Kenton Varda (formerly Google, designer of Protocol Buffer…
- Security Legends: Engineering Practices for Locara What this is: A research note distilling principles from systems with legendary security reputations, with concrete application to Locara…
- shadcn/ui What it is: A radically different distribution model for UI components. Not a package — a CLI that copies component source code into your…
- Shopify App Store What it is: Vertical SaaS marketplace for apps that integrate with Shopify merchants. Curated, reviewed, with a built-in Billing API for …
- SQLite What it is: Embedded SQL database engine. Single-file storage, public domain, written in C, exhaustively tested. Most-deployed database i…
- Steam What it is: Valve's PC game distribution platform. Two-sided marketplace with developer-set prices, lightweight review (Steam Direct), st…
- Tailscale What it is: Mesh VPN-as-a-service built on WireGuard, plus a control plane for identity/auth/policies. Founded 2019 by ex-Google networki…
- Tailwind CSS What it is: Utility-first CSS framework. Adam Wathan, 2017. Replaced the "component classes" era (Bootstrap, Foundation) with class-per-s…
- Tauri What it is: Rust-based framework for building cross-platform desktop apps (and now mobile) with web frontends. Successor mindset to Elect…
- tauri-webdriver-automation (danielraffel) — Security Audit What it is: Community Tauri 2 plugin that exposes a WebDriver-shaped HTTP server inside a debug-built Tauri app on macOS, plus a CLI (tau…
- v0 + Community Design Systems (Vercel pattern) What it is: The pattern Vercel pioneered with shadcn/ui + v0 + community registries — open-source UI components distributed not as instal…
- Vercel + Next.js What it is: A framework (Next.js — React-based fullstack) tightly coupled with a deployment platform (Vercel). Together they define moder…
- VS Code Marketplace What it is: Microsoft-run marketplace for VS Code extensions. ~50k+ extensions, dominant editor in software dev, marketplace is the close…
- Wasmtime + WASI What it is: Wasmtime is a fast, secure, standards-compliant WebAssembly runtime by the Bytecode Alliance. WASI (WebAssembly System Interf…
- Whisper and the Local STT Landscape What it is: A survey of speech-to-text (STT) options for local execution on Apple Silicon, anchored on OpenAI Whisper (the open-weights m…
- Xcode + Swift Toolchain What it is: Apple's IDE and build system for iOS, macOS, watchOS, tvOS, visionOS apps. Combines code editor, build system (xcodebuild), s…