TY - GEN
T1 - Multi-virtual View Scoring Network for 3D Hand Pose Estimation from a Single Depth Image
AU - Tian, Yimeng
AU - Li, Chen
AU - Tian, Lihua
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd 2024.
PY - 2024
Y1 - 2024
N2 - 3D hand pose estimation is a crucial subject in the domain of computer vision. Recently researchers transform a single depth image into multiple virtual view depth images. By projecting a single depth image through point cloud transformation and using the depth images of multiple virtual views together for hand pose estimation, these methods can effectively improve the estimation accuracy. However, current methods have issues with distorted generated depth images, insufficient usage of the depth image of each view, and high computational overhead. To overcome these problems, we introduce a multi-virtual view scoring network (MVSN). Our proposed MVSN consists of a single virtual view estimation module, virtual view feature encoding module, and virtual view scoring module. To generate an intermediate feature map suitable for virtual view scoring, the single virtual view estimation module uses a feature map offset loss function and enhance information interaction between channels in the backbone network. The virtual view feature encoding module adopts a two-branch structure to capture information about all joints and single joints from the intermediate feature map, respectively. This structure effectively improves model sensitivity to each view, better integrates information from each virtual view, and obtains a more appropriate scoring feature for each virtual view. The virtual view scoring module scores each view based on the scoring feature, and gives a higher score to the more accurately estimated virtual view. We also propose a dynamic virtual view removal strategy to remove poor quality views in the training process. Our model is tested on the NYU and ICVL datasets, and the mean joint error is 6.21 mm and 4.53 mm, respectively, exhibiting better estimation accuracy than existing methods.
AB - 3D hand pose estimation is a crucial subject in the domain of computer vision. Recently researchers transform a single depth image into multiple virtual view depth images. By projecting a single depth image through point cloud transformation and using the depth images of multiple virtual views together for hand pose estimation, these methods can effectively improve the estimation accuracy. However, current methods have issues with distorted generated depth images, insufficient usage of the depth image of each view, and high computational overhead. To overcome these problems, we introduce a multi-virtual view scoring network (MVSN). Our proposed MVSN consists of a single virtual view estimation module, virtual view feature encoding module, and virtual view scoring module. To generate an intermediate feature map suitable for virtual view scoring, the single virtual view estimation module uses a feature map offset loss function and enhance information interaction between channels in the backbone network. The virtual view feature encoding module adopts a two-branch structure to capture information about all joints and single joints from the intermediate feature map, respectively. This structure effectively improves model sensitivity to each view, better integrates information from each virtual view, and obtains a more appropriate scoring feature for each virtual view. The virtual view scoring module scores each view based on the scoring feature, and gives a higher score to the more accurately estimated virtual view. We also propose a dynamic virtual view removal strategy to remove poor quality views in the training process. Our model is tested on the NYU and ICVL datasets, and the mean joint error is 6.21 mm and 4.53 mm, respectively, exhibiting better estimation accuracy than existing methods.
KW - 3-D hand pose estimation
KW - Computer vision
KW - Depth image
KW - Hand pose estimation
UR - https://www.scopus.com/pages/publications/85181981385
U2 - 10.1007/978-981-99-9109-9_15
DO - 10.1007/978-981-99-9109-9_15
M3 - 会议稿件
AN - SCOPUS:85181981385
SN - 9789819991082
T3 - Communications in Computer and Information Science
SP - 147
EP - 164
BT - Artificial Intelligence and Robotics - 8th International Symposium, ISAIR 2023, Revised Selected Papers
A2 - Lu, Huimin
A2 - Cai, Jintong
PB - Springer Science and Business Media Deutschland GmbH
T2 - 8th International Symposium on Artificial Intelligence and Robotics, ISAIR 2023
Y2 - 21 October 2023 through 23 October 2023
ER -