`image-to-image` (no text instruction)

HF group: Computer Vision · Status: ❌ not built

What it is

Image → image without a text prompt. Super-resolution, restoration, denoising, style transfer based on a reference image. Distinct from image-text-to-image which takes a natural-language instruction.

Open-weight models

Model	Params	Released	License	Quality	Notes
Real-ESRGAN	~17 M	2021	BSD-3	Lightweight super-res	The default upscaler for many tools.
SwinIR	~12 M	2021	Apache-2.0	Strong restore	Older but quality.
Stable Diffusion + img2img	2.5-8 B	2024	Stability community	Style transfer with strength param	Reference-based.
SUPIR	~13 B	2024	Apache-2.0	Photorealistic restoration	Heavy but striking results.

Infrastructure required

Inference

❌ Lightweight ONNX path (Real-ESRGAN, SwinIR — small enough not to need diffusion runtime).
❌ Diffusion runtime for diffusion-based variants.

Input

❌ Image input pipeline.

Output

❌ Edited image bytes; save to user-folder.

Storage

❌ Weights cache.
Output: fs.user-folder.

Interaction (IPC + SDK)

❌ image.transform({ image, op }) IPC where op selects super-res / denoise / etc.

Capabilities (manifest)

capabilities.fs.user-selected for input.
capabilities.fs.user-folder for output.
capabilities.models[] for the model.

Gaps

Image input pipeline (shared with several other CV modalities). Cleanest first deliverable: Real-ESRGAN as ONNX for super-res — small, fast, no diffusion runtime needed.

image-to-image (no text instruction)