MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips

1The Hong Kong University of Science and Technology (Guangzhou),
2The Hong Kong University of Science and Technology
3ETH Zürich, Switzerland
4Max Planck Institute for Intelligent Systems, Tübingen, Germany

🌴ICCV 2025🥥



Abstract

Most RGB‑based hand–object reconstruction methods assume full object visibility. However, human interactions with various objects, often recorded in casual videos, typically involve partial observations.

To address this, we introduce MagicHOI, a method for reconstructing hands and objects from short monocular videos, even with limited viewpoint variation.

Our key insight is that, despite the scarcity of paired 3D hand–object data, large‑scale novel view synthesis models provide rich object priors to help regularize unseen object regions during hand-object interactions.


Video

Comparison to SOTA Methods

We compare our method, which integrates geometry-driven and prior-driven approaches, with the geometry-driven only method HOLD and prior-driven only method EasyHOI with limited views without full visibility.

Scene:

MC1 SM4 ABF12 GPMF12

Method:

Ours HOLD EasyHOI


In-the-wild results of our method

Scene:

Controller Glue Gun Osmo Pocket Toy Plane

BibTeX

@article{wang2024Magichoi,
      title     = {MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips},
      author    = {Shibo Wang and Haonan He and Maria Parelli and Christoph Gebhardt and Zicong Fan and Jie Song},
      journal   = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
      year      = {2025},
    }