Locara

image-to-image (no text instruction)

HF group: Computer Vision · Status: ❌ not built

What it is

Image → image without a text prompt. Super-resolution, restoration, denoising, style transfer based on a reference image. Distinct from image-text-to-image which takes a natural-language instruction.

Open-weight models

ModelParamsReleasedLicenseQualityNotes
Real-ESRGAN~17 M2021BSD-3Lightweight super-resThe default upscaler for many tools.
SwinIR~12 M2021Apache-2.0Strong restoreOlder but quality.
Stable Diffusion + img2img2.5-8 B2024Stability communityStyle transfer with strength paramReference-based.
SUPIR~13 B2024Apache-2.0Photorealistic restorationHeavy but striking results.

Infrastructure required

Inference

  • ❌ Lightweight ONNX path (Real-ESRGAN, SwinIR — small enough not to need diffusion runtime).
  • ❌ Diffusion runtime for diffusion-based variants.

Input

  • ❌ Image input pipeline.

Output

  • ❌ Edited image bytes; save to user-folder.

Storage

  • ❌ Weights cache.
  • Output: fs.user-folder.

Interaction (IPC + SDK)

  • image.transform({ image, op }) IPC where op selects super-res / denoise / etc.

Capabilities (manifest)

  • capabilities.fs.user-selected for input.
  • capabilities.fs.user-folder for output.
  • capabilities.models[] for the model.

Gaps

Image input pipeline (shared with several other CV modalities). Cleanest first deliverable: Real-ESRGAN as ONNX for super-res — small, fast, no diffusion runtime needed.

See also