OnlineHMR: Video-based Online World-Grounded Human Mesh Recovery

Interactive Gallery

Explore the 4D reconstruction results on various number of individuals and diverse scene.

Drag with left click to rotate view

Scroll to zoom in/out

Drag with right click to move view

Moving forward and backward

Moving left and right

Moving upward and downward

(Point clouds are downsampled for efficient online rendering)

Abstract

Human mesh recovery (HMR) models 3D human body from monocular videos, with recent works extending it to world-coordinate human trajectory and motion reconstruction. However, most existing methods remain offline, relying on future frames or global optimization, which limits their applicability in interactive feedback and perception-action loop scenarios such as AR/VR and telepresence. To address this, we propose OnlineHMR, a fully online framework that jointly satisfies four essential criteria of online processing, including system-level causality, faithfulness, temporal consistency, and efficiency. Built upon a two-branch architecture, OnlineHMR enables streaming inference via a causal key–value cache design and a curated sliding-window learning strategy. Meanwhile, a human-centric incremental SLAM provides online world-grounded alignment under physically plausible trajectory correction. Experimental results show that our method achieves performance comparable to existing chunk-based approaches on the standard EMDB benchmark and highly dynamic custom videos, while uniquely supporting online processing.

Causality

Leveraging information only from previous frames through a cache memory.

Input: a streaming video. Output: World coordinate streaming SMPL.
1) Camera coordinates OnlineHMR estimates the parametric human model in camera space.
2) Human-Centric Incremental SLAM estimates the camera extrinsic.
3) Metrics depth together with SLAM depth to scale the SLAM outputs to the world.

Faithfulness

Quantitatively, we compare our result with previous and concurrent offline/online methods on the camera coordinate benchmark 3DPW and EMDB1, and world coordinate benchmark EMDB2. Qualitatively, we test our method on custom art and sports videos.

0:00-1:20, custom videos. 1:20-2:58, EMDB2.

OnlineHMR: Video-based Online World-Grounded Human Mesh Recovery

Interactive Gallery

Abstract

Causality

Faithfulness

Temporal Consistency

Efficiency

Acknowledgements