08 — Storage

Overview

Each Locara app has a per-app SQLite database for relational data and a per-app vector store (sqlite-vec or zvec) for embeddings. Storage is isolated by container — apps cannot read each other’s data.

Storage hierarchy

~/Library/Containers/<app-id>/Data/Documents/
├── app.sqlite                  # Main app database
├── app.sqlite-wal             # Write-ahead log
├── app.sqlite-shm             # Shared memory file
└── (additional DBs if app declares them)

The app sees this as $APP_DATA via the SDK. Apps don’t manage paths directly; they use db.run, db.all, etc.

SQLite as default (committed)

Locara uses SQLite for all relational storage:

Single file per app — trivial backups, exports, deletes.
Zero-ops — no daemon, no setup.
WAL mode by default — concurrent reads + single writer; good throughput.
FTS5 enabled — full-text search built in.
JSON1 enabled — structured data without separate JSON DB.
Loadable extensions enabled (for sqlite-vec, zvec adapter).

Why not Postgres / DuckDB / etc.: see ../notes/sqlite.md. Short answer: long-term stability + zero-ops + single-file > additional power for v1’s app shapes.

Vector storage

Two backends:

`sqlite-vec` (default)

Lives inside the same SQLite file as relational data.
Single source of truth, single backup, single export.
Capable up to ~1M vectors, hundreds of QPS.
Apache 2.0.
Good enough for ~95% of local-app use cases.

`zvec` (opt-in)

Separate in-process library; embedded but not inside SQLite.
Sparse + dense + hybrid search.
Better for >1M vectors or sophisticated hybrid search.
Apache 2.0.

Apps declare which engine they want:

"storage": {
  "vector": { "engine": "sqlite-vec" }
}

The SDK abstracts both behind db.vec.* and db.search.hybrid. Apps that switch engines change one config line; SDK calls don’t change. (Some advanced features may be engine-specific; surfaced via runtime warnings.)

Schema declaration

Each app’s schema lives in db/schema.sql:

-- db/schema.sql

CREATE TABLE transcripts (
  id          TEXT PRIMARY KEY,
  title       TEXT NOT NULL,
  text        TEXT NOT NULL,
  duration    INTEGER NOT NULL,
  created_at  INTEGER NOT NULL DEFAULT (unixepoch())
);

CREATE INDEX idx_transcripts_created ON transcripts (created_at DESC);

-- FTS5 virtual table (auto-synced via triggers)
CREATE VIRTUAL TABLE transcripts_fts USING fts5(
  title, text,
  content='transcripts',
  content_rowid='rowid'
);

-- Vector table (sqlite-vec)
CREATE VIRTUAL TABLE transcripts_vec USING vec0(
  embedding FLOAT[768]
);

-- Triggers to keep fts/vec in sync
CREATE TRIGGER transcripts_ai AFTER INSERT ON transcripts BEGIN
  INSERT INTO transcripts_fts(rowid, title, text) VALUES (new.rowid, new.title, new.text);
END;
-- (similar AU, AD triggers)

The CLI parses this schema and:

Generates TS types in node_modules/@locara/sdk/db.d.ts.
Computes diffs between published versions for migrations.

Migrations

Each app version’s schema is the source of truth. Migrations between versions are derived.

db/
├── schema.sql                     # Current schema (the truth)
└── migrations/
    ├── 0001_initial.sql           # Auto-generated by `locara db migrate`
    ├── 0002_add_user_table.sql    # Generated; can be hand-edited
    └── 0003_add_embedding_idx.sql

Workflow:

# Edit schema.sql, e.g., add a new column
# Then:
locara db migrate
# CLI computes diff: schema.sql vs last applied migration
# Outputs proposed migration SQL
# Developer reviews, optionally edits, commits

For non-trivial changes (column type changes, data backfills), the developer hand-writes the migration. CLI helps with the obvious cases (add column, add table, add index).

On user app update:

1. Runtime detects new app version's schema.sql differs from current DB schema.
2. Runtime applies pending migrations in order, in a transaction.
3. If migration fails: roll back, refuse update, surface error.
4. Backups: before each migration, snapshot DB to .backup-<version>.sqlite (kept for 7 days).

Migration safety is non-negotiable. A broken migration that destroys user data is the fastest way to kill trust.

Transactions

import { db } from '@locara/sdk'

await db.transaction(async (tx) => {
  await tx.run`INSERT INTO transcripts (id, title) VALUES (${id}, ${title})`
  await tx.run`INSERT INTO transcripts_fts (rowid, title) VALUES (last_insert_rowid(), ${title})`
})

Auto-commit on resolved promise; rollback on rejection.

Concurrency

WAL mode permits one writer + many readers. The SDK serializes writes through a single mutex per database. For most apps this is fine — local single-user apps have low write contention.

Heavy-write workloads: consider partitioning into multiple databases (e.g., per-project DBs) declared in manifest.

Per-app isolation

Apps cannot access each other’s databases. Each container has its own Data/Documents/ and SQLite files. The macOS App Sandbox enforces this at the kernel level.

If two apps need to share data, they declare an ipc capability with each other; the runtime mediates. (See 03-capabilities.md.)

Backups

Apps can opt into Locara-mediated backups:

"storage": {
  "backups": {
    "enabled": true,
    "frequency": "daily",
    "retention_days": 30
  }
}

Backups go to ~/Library/Application Support/Locara/Backups/<app-id>/<timestamp>.sqlite. Apps can prompt the user to restore from a backup via SDK (db.restore.list, db.restore.apply).

Cloud backup (iCloud Drive, etc.) is not v1. Apps can’t ask for it because the privacy story complicates.

Encryption

(open) SQLite databases are unencrypted by default. macOS FileVault encrypts the disk; container is per-app-isolated. Is per-app DB encryption (SQLCipher etc.) worth the complexity?

Leaning no for v1: rely on FileVault + container isolation. Apps with extreme privacy needs can opt into SQLCipher (declare in manifest, runtime supports). Cost: SQLCipher is GPL/proprietary, complicates build.

Common patterns

RAG / hybrid search

const queryEmbedding = await embed.one({ model: 'nomic-embed-text-v1.5', text: userQuery })

const results = await db.search.hybrid('transcripts', {
  query: userQuery,
  embedding: queryEmbedding,
  weights: { keyword: 0.3, vector: 0.7 },
  k: 10,
})

The runtime translates this into a query joining transcripts_fts (FTS5) and transcripts_vec (sqlite-vec) with rank fusion.

JSON columns

CREATE TABLE settings (
  id    INTEGER PRIMARY KEY,
  data  TEXT  -- JSON
);

const settings = await db.one<{ data: string }>`SELECT data FROM settings WHERE id = 1`
const parsed = JSON.parse(settings.data)

For typed JSON, use Zod or similar in app code; SDK doesn’t try to add a JSON-typing layer.

What’s not in v1

Distributed / multi-device sync
Cloud backups
Real-time replication
Multi-tenant within a single app
DuckDB or analytical engines

These are post-v1 considerations. The privacy thesis benefits from staying single-device.

Cross-references

SDK API: 05-sdk.md
Capability model (storage capabilities): 03-capabilities.md
SQLite design philosophy: ../notes/sqlite.md
Vector backend choices: ../notes/huggingface-hub.md (mentioned)