8 months ago

Abstract

We present an approach for 3D global human mesh recovery from monocularvideos recorded with dynamic cameras. Our approach is robust to severe andlong-term occlusions and tracks human bodies even when they go outside thecamera's field of view. To achieve this, we first propose a deep generativemotion infiller, which autoregressively infills the body motions of occludedhumans based on visible motions. Additionally, in contrast to prior work, ourapproach reconstructs human meshes in consistent global coordinates even withdynamic cameras. Since the joint reconstruction of human motions and cameraposes is underconstrained, we propose a global trajectory predictor thatgenerates global human trajectories based on local body movements. Using thepredicted trajectories as anchors, we present a global optimization frameworkthat refines the predicted trajectories and optimizes the camera poses to matchthe video evidence such as 2D keypoints. Experiments on challenging indoor andin-the-wild datasets with dynamic cameras demonstrate that the proposedapproach outperforms prior methods significantly in terms of motion infillingand global mesh recovery.

Source PDF View Code