MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips

¹The Hong Kong University of Science and Technology (Guangzhou),
²ETH Zürich, Switzerland
³The Hong Kong University of Science and Technology ⁴Max Planck Institute for Intelligent Systems, Tübingen, Germany

Abstract

Most RGB-based hand-object reconstruction methods depend on object templates, and template-free approaches assume full object visibility. However, this assumption often fails in real-world scenarios, where cameras are fixed and objects are held in a static grip, causing parts of the object to remain unobserved and leading to unrealistic reconstructions.

To address this challenge, we introduce MagicHOI, a method for reconstructing hands and objects from short monocular interaction videos, even with limited view variations. Our key insight is that, although paired 3D hand-object data is extremely scarce, large-scale diffusion models, such as image-to-3D models, provide abundant object supervision. This additional supervision acts as a prior to help regularize unseen object regions during hand interactions.

Leveraging this insight, MagicHOI integrates the image-to-3D diffusion model into the reconstruction framework. We further refine hand poses by incorporating hand-object interaction constraints. Our results demonstrate that MagicHOI significantly outperforms existing state-of-the-art template-free reconstruction methods. We also show that image-to-3D diffusion priors effectively regularize unseen object regions, enhancing 3D hand-object reconstruction. Moreover, the improved object geometries lead to more accurate hand poses.

Comparison to SOTA Methods

We compare our method, which integrates geometry-driven and prior-driven approaches, with the geometry-driven only method HOLD and prior-driven only method EasyHOI.

Scene:

MC1 SM4 ABF12 GPMF12

Method:

Ours HOLD EasyHOI

In-the-wild results of our method

Scene:

Controller Glue Gun Osmo Pocket Toy Plane

MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips

🌴ICCV 2025🥥

Code Coming Soon!

Abstract

Video

Comparison to SOTA Methods

In-the-wild results of our method

BibTeX