05 — SDK (`@locara/sdk`)

The TypeScript API surface developers code against. Designed to be:

Predictable. Same shape across modalities.
Strongly typed. Schema-driven types; LLMs author against known signatures.
Streaming-native. Anything that can stream does, by default.
Capability-aware. Calls that require declared capabilities surface clear errors when not declared.

Modules

import { llm }        from '@locara/sdk/llm'
import { embed }      from '@locara/sdk/embed'
import { transcribe } from '@locara/sdk/transcribe'
import { vlm }        from '@locara/sdk/vlm'
import { ocr }        from '@locara/sdk/ocr'
import { db }         from '@locara/sdk/db'
import { tools }      from '@locara/sdk/tools'
import { fs }         from '@locara/sdk/fs'
import { audio }      from '@locara/sdk/audio'

Default import is @locara/sdk which re-exports everything.

`llm` — Chat / completion

import { llm } from '@locara/sdk'

// One-shot
const response = await llm.chat({
  model: 'qwen2.5-3b-instruct-q4',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user',   content: 'What is the capital of France?' }
  ],
  temperature: 0.7,
})
console.log(response.content) // string

// Streaming
const stream = llm.chatStream({
  model: 'qwen2.5-3b-instruct-q4',
  messages: [...],
})
for await (const delta of stream) {
  process.stdout.write(delta.content)
}

// With tool calling
const response = await llm.chat({
  model: 'qwen2.5-3b-instruct-q4',
  messages: [...],
  tools: ['wasm.text-utils'], // must be declared in manifest
})

Returned response is typed:

interface ChatResponse {
  content: string
  usage: { input_tokens: number; output_tokens: number; ms_total: number }
  tool_calls?: ToolCall[]
}

`embed` — Embeddings

import { embed } from '@locara/sdk'

const vectors = await embed.create({
  model: 'nomic-embed-text-v1.5',
  texts: ['first text', 'second text'],
})
// vectors: Float32Array[]

const vector = await embed.one({
  model: 'nomic-embed-text-v1.5',
  text: 'just one',
})

`transcribe` — Speech-to-text

import { transcribe } from '@locara/sdk'

// From file (requires fs.user-selected)
const result = await transcribe.fromFile(filePath, {
  model: 'whisper-large-v3-q4',
  language: 'en', // or 'auto'
})
// result: { text, segments, language, duration_seconds }

// From mic stream (requires device.microphone)
const stream = transcribe.live({
  model: 'whisper-large-v3-q4',
})
for await (const segment of stream) {
  console.log(segment.text)
}

`vlm` — Vision-language models

import { vlm } from '@locara/sdk'

const description = await vlm.describe({
  model: 'qwen2-vl-2b',
  image: imageBlob,
  prompt: 'What is in this image?',
})

`ocr` — Document text extraction

import { ocr } from '@locara/sdk'

const result = await ocr.extract({
  model: 'glm-ocr-1.5',
  source: pdfBlob, // or image
})
// result: { text, blocks: [{ bbox, text, confidence }], pages }

`db` — Storage

The db module is typed by your schema.sql (see 08-storage.md). The CLI runs locara db generate-types (or auto-runs during dev) to produce TS types from your schema.

import { db } from '@locara/sdk'

// Tagged-template SQL (sqlite-style placeholders, but as TS template literals)
await db.run`INSERT INTO transcripts (id, title, text) VALUES (${id}, ${title}, ${text})`

const rows = await db.all<Transcript>`SELECT * FROM transcripts WHERE id = ${id}`

const row = await db.one<Transcript>`SELECT * FROM transcripts WHERE id = ${id}`

// Vector search
const results = await db.vec.search('transcripts_vec', queryEmbedding, {
  k: 5,
  filter: { project_id: 'abc' }, // optional metadata filter
})
// results: Array<{ id, score, ...metadata }>

// Hybrid search (FTS5 + vec)
const hybrid = await db.search.hybrid('transcripts', {
  query: 'meeting notes',
  embedding: queryEmbedding,
  weights: { keyword: 0.3, vector: 0.7 },
  k: 10,
})

// Transactions
await db.transaction(async (tx) => {
  await tx.run`INSERT ...`
  await tx.run`UPDATE ...`
})

`tools` — Sandboxed tool execution

import { tools } from '@locara/sdk'

// Invoke a declared tool
const result = await tools.invoke('wasm.text-utils', {
  op: 'word-count',
  text: '...'
})
// result type depends on tool schema; auto-generated from registry

LLMs can be given tools to call:

const response = await llm.chat({
  model: 'qwen2.5-7b-instruct-q4',
  messages: [...],
  tools: ['wasm.text-utils', 'wasm.summarizer'], // must be declared in manifest
})
// If model wants to call a tool, it returns tool_calls

`fs` — User-selected file access

import { fs } from '@locara/sdk'

// Open file picker (Powerbox-mediated)
const file = await fs.pick({
  types: ['audio/*'],
  multiple: false,
})
// file: { path, name, size, type, handle }

// Read user-selected file (requires fs.user-selected)
const content = await fs.read(file.handle)

// Save (Powerbox-mediated)
const saveHandle = await fs.save({
  defaultName: 'transcript.md',
  content: '...',
})

`audio` — Audio capture / playback

import { audio } from '@locara/sdk'

// Record from mic (requires device.microphone)
const recording = await audio.record({ maxSeconds: 600 })
// recording: { blob, duration, format }

// Play back
await audio.play(blob)

Errors

All SDK calls return promises. Errors thrown:

class CapabilityDeniedError extends Error {
  capability: string
  details: string
}

class ResourceNotAvailableError extends Error {
  resource: 'model' | 'memory' | 'disk'
  required: string
  available: string
}

class ToolError extends Error {
  tool: string
  cause: 'not-declared' | 'wasm-trap' | 'timeout' | 'invalid-args'
}

class StorageError extends Error {
  cause: 'sql-error' | 'migration-failed' | 'schema-mismatch'
}

Most error UX is “the SDK call rejected; show the user something sensible.” The SDK doesn’t try to map errors to user-facing strings — that’s app-author territory.

Streaming convention

Anything that can stream uses async generators:

for await (const chunk of llm.chatStream({...})) {
  // chunk: { content?: string, tool_calls?: ToolCall[], done?: boolean }
}

Cancellation is via standard AbortController:

const controller = new AbortController()
setTimeout(() => controller.abort(), 5000)

const stream = llm.chatStream({
  ...,
  signal: controller.signal,
})

Type generation from schema

locara dev runs a watcher that:

Watches db/schema.sql for changes.
On change, regenerates node_modules/@locara/sdk/db.d.ts with table types.
App code gets immediate TS feedback in the editor.

-- schema.sql
CREATE TABLE transcripts (
  id TEXT PRIMARY KEY,
  title TEXT NOT NULL,
  text TEXT NOT NULL,
  created_at INTEGER NOT NULL
);

Generates:

// generated
export interface Transcript {
  id: string
  title: string
  text: string
  created_at: number
}

App code:

const t = await db.one<Transcript>`SELECT * FROM transcripts WHERE id = ${id}`
//      ^? Transcript | undefined

Versioning

@locara/sdk follows semver. Apps declare a minimum SDK version in their manifest (requires_sdk); the runtime checks compatibility at install time.

Breaking changes are rare — capability surface evolution is preferred over API changes.

What the SDK does NOT include

No agent loops. If a developer needs an agent, they write a while loop calling llm.chat. (Avoid LangChain’s mistake.)
No prompt templates / chains. Use plain TS string templates.
No memory abstraction. Storage is db. If an app wants a “memory” module, it builds one on top of db.
No retrievers / pipelines. Use db.search.hybrid directly.
No model fine-tuning APIs. Out of scope for v1.

The SDK is primitives, not architecture. Apps assemble what they need.

Cross-references

Modality concept (high-level): 04-modalities.md
Capability declarations the SDK respects: 03-capabilities.md
Storage / schema details: 08-storage.md
Model loading and routing: 09-models.md
Tool runtime details: 10-tools.md
LangChain over-abstraction lesson: ../notes/langchain.md

05 — SDK (@locara/sdk)

Modules

llm — Chat / completion

embed — Embeddings

transcribe — Speech-to-text

vlm — Vision-language models

ocr — Document text extraction

db — Storage

tools — Sandboxed tool execution

fs — User-selected file access

audio — Audio capture / playback