TY - GEN
T1 - Deep conditional variational estimation for depth-based hand poses
AU - Xu, Lu
AU - Hu, Chen
AU - Li, Yinqi
AU - Tao, Ji'an
AU - Xue, Jianru
AU - Mei, Kuizhi
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/5
Y1 - 2019/5
N2 - We propose a novel and effective approach for 3D hand pose estimation on single depth image. Instead of doing deterministic regression from depth images, our model focuses on learning a latent distribution to model the high dimensional space of pose joints, which can also be interpreted as a kinematics model for human hands. Specifically, the proposed network combines the framework of conditional variational autoencoder which learns an encoder and a decoder with standard convolutional network. The encoder models the latent variable as a prior or a regularization for the pose joints. Then probabilistic inference is performed by the decoder to generate the output prediction conditioned on input depth images. In addition, we introduce a pool-convolution module to improve the localization regression of the network. The architecture can be trained end-to-end. In experiments, we demonstrate the effectiveness of our proposed approach in comparison to various state-of-art holistic regression approaches.
AB - We propose a novel and effective approach for 3D hand pose estimation on single depth image. Instead of doing deterministic regression from depth images, our model focuses on learning a latent distribution to model the high dimensional space of pose joints, which can also be interpreted as a kinematics model for human hands. Specifically, the proposed network combines the framework of conditional variational autoencoder which learns an encoder and a decoder with standard convolutional network. The encoder models the latent variable as a prior or a regularization for the pose joints. Then probabilistic inference is performed by the decoder to generate the output prediction conditioned on input depth images. In addition, we introduce a pool-convolution module to improve the localization regression of the network. The architecture can be trained end-to-end. In experiments, we demonstrate the effectiveness of our proposed approach in comparison to various state-of-art holistic regression approaches.
UR - https://www.scopus.com/pages/publications/85070466803
U2 - 10.1109/FG.2019.8756559
DO - 10.1109/FG.2019.8756559
M3 - 会议稿件
AN - SCOPUS:85070466803
T3 - Proceedings - 14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019
BT - Proceedings - 14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019
Y2 - 14 May 2019 through 18 May 2019
ER -