Locara

keypoint-detection

HF group: Computer Vision · Status: ❌ not built

What it is

Image / video → joint locations (pose). Hands, body, face landmarks. Useful for AR, fitness apps, sign-language recognition.

Open-weight models

ModelParamsReleasedLicenseQualityNotes
RTMPose2-90 M2023Apache-2.0Real-time SOTAOpenMMLab.
ViTPose90-660 M2022Apache-2.0High quality, slowerTransformer.
MediaPipe Pose / Hands / Facen/an/aApache-2.0Real-time on CPU/mobileBattle-tested at Google.
Apple Vision (VNDetectHumanBodyPoseRequest)n/amacOSAppleNative, fastFree.

Infrastructure required

Inference

  • ❌ Pose-detection runtime.
  • ✅ Apple Vision integration would be cheapest first-deliverable.

Input

  • ❌ Image / video input pipeline.

Output

  • ❌ Joint coordinates + confidences.
  • Skeleton overlay UI in @locara/components.

Storage

  • ❌ Weights cache (or none, for native API).

Interaction (IPC + SDK)

  • vision.pose({ image }) IPC.

Capabilities (manifest)

  • capabilities.fs.user-selected or device.camera.
  • capabilities.models[].

Gaps

Same as other CV modalities — image input, overlay UI, encoder- only inference path.

See also