Abstract
Human motion generation from sparse observations is an ill-posed problem in AR/VR, where head-mounted devices often capture only head and wrist trajectories. Prior methods usually reconstruct full-body motion in a single stage, forcing inference over a vast solution space and producing inaccurate lower-body motion, weak temporal coherence, and implausible sequences that degrade avatar embodiment. We present MAGE, a Multi-stage Avatar GEnerator based on hierarchical diffusion. Instead of predicting 22-joint motion at once, MAGE progressively refines motion from a coarse 6-part representation to full joints. Each stage injects stage-specific motion priors and uses intermediate predictions to constrain subsequent refinement, reducing ambiguity and stabilizing dynamics. Experiments on large-scale motion datasets show that MAGE improves reconstruction accuracy, temporal smoothness, and perceptual realism over state-of-the-art baselines, enabling more reliable full-body animation from minimal AR/VR sensing while preserving real-time interaction.
| Original language | English |
|---|---|
| Journal | IEEE Signal Processing Letters |
| DOIs | |
| State | Accepted/In press - 2026 |
Keywords
- Animation
- Diffusion models
- Machine learning
- Motion analysis
- Virtual reality
Fingerprint
Dive into the research topics of 'Hierarchical Diffusion for Sparse-to-Full Human Motion Reconstruction'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver