TY - GEN
T1 - Discriminative learning of visual words for 3D human pose estimation
AU - Ning, Huazhong
AU - Xu, Wei
AU - Gong, Yihong
AU - Huang, Thomas
PY - 2008
Y1 - 2008
N2 - This paper addresses the problem of recovering 3D human pose from a single monocular image, using a discriminative bag-of-words approach. In previous work, the visual words are learned by unsupervised clustering algorithms. They capture the most common patterns and are good features for coarse-grain recognition tasks like object classification. But for those tasks which deal with subtle differences such as pose estimation, such representation may lack the needed discriminative power. In this paper, we propose to jointly learn the visual words and the pose regressors in a supervised manner. More specifically, we learn an individual distance metric for each visual word to optimize the pose estimation performance. The learned metrics rescale the visual words to suppress unimportant dimensions such as those corresponding to background. Another contribution is that we design an Appearance and Position Context (APC) local descriptor that achieves both selectivity and invariance while requiring no background subtraction. We test our approach on both a quasi-synthetic dataset and a real dataset (HumanEva) to verify its effectiveness. Our approach also achieves fast computational speed thanks to the integral histograms used in APC descriptor extraction and fast inference of pose regressors.
AB - This paper addresses the problem of recovering 3D human pose from a single monocular image, using a discriminative bag-of-words approach. In previous work, the visual words are learned by unsupervised clustering algorithms. They capture the most common patterns and are good features for coarse-grain recognition tasks like object classification. But for those tasks which deal with subtle differences such as pose estimation, such representation may lack the needed discriminative power. In this paper, we propose to jointly learn the visual words and the pose regressors in a supervised manner. More specifically, we learn an individual distance metric for each visual word to optimize the pose estimation performance. The learned metrics rescale the visual words to suppress unimportant dimensions such as those corresponding to background. Another contribution is that we design an Appearance and Position Context (APC) local descriptor that achieves both selectivity and invariance while requiring no background subtraction. We test our approach on both a quasi-synthetic dataset and a real dataset (HumanEva) to verify its effectiveness. Our approach also achieves fast computational speed thanks to the integral histograms used in APC descriptor extraction and fast inference of pose regressors.
UR - https://www.scopus.com/pages/publications/51949105335
U2 - 10.1109/CVPR.2008.4587534
DO - 10.1109/CVPR.2008.4587534
M3 - 会议稿件
AN - SCOPUS:51949105335
SN - 9781424422432
T3 - 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR
BT - 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR
T2 - 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR
Y2 - 23 June 2008 through 28 June 2008
ER -