Most RGB‑based hand–object reconstruction methods assume full object visibility. However, human interactions with various objects, often recorded in casual videos, typically involve partial observations.
To address this, we introduce MagicHOI, a method for reconstructing hands and objects from short monocular videos, even with limited viewpoint variation.
Our key insight is that, despite the scarcity of paired 3D hand–object data, large‑scale novel view synthesis models provide rich object priors to help regularize unseen object regions during hand-object interactions.
We compare our method, which integrates geometry-driven and prior-driven approaches, with the geometry-driven only method HOLD and prior-driven only method EasyHOI with limited views without full visibility.
Scene:
MC1 SM4 ABF12 GPMF12Method:
Ours HOLD EasyHOIScene:
Controller Glue Gun Osmo Pocket Toy Plane@article{wang2024Magichoi,
title = {MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips},
author = {Shibo Wang and Haonan He and Maria Parelli and Christoph Gebhardt and Zicong Fan and Jie Song},
journal = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year = {2025},
}