Multi-virtual View Scoring Network for 3D Hand Pose Estimation from a Single Depth Image

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

3D hand pose estimation is a crucial subject in the domain of computer vision. Recently researchers transform a single depth image into multiple virtual view depth images. By projecting a single depth image through point cloud transformation and using the depth images of multiple virtual views together for hand pose estimation, these methods can effectively improve the estimation accuracy. However, current methods have issues with distorted generated depth images, insufficient usage of the depth image of each view, and high computational overhead. To overcome these problems, we introduce a multi-virtual view scoring network (MVSN). Our proposed MVSN consists of a single virtual view estimation module, virtual view feature encoding module, and virtual view scoring module. To generate an intermediate feature map suitable for virtual view scoring, the single virtual view estimation module uses a feature map offset loss function and enhance information interaction between channels in the backbone network. The virtual view feature encoding module adopts a two-branch structure to capture information about all joints and single joints from the intermediate feature map, respectively. This structure effectively improves model sensitivity to each view, better integrates information from each virtual view, and obtains a more appropriate scoring feature for each virtual view. The virtual view scoring module scores each view based on the scoring feature, and gives a higher score to the more accurately estimated virtual view. We also propose a dynamic virtual view removal strategy to remove poor quality views in the training process. Our model is tested on the NYU and ICVL datasets, and the mean joint error is 6.21 mm and 4.53 mm, respectively, exhibiting better estimation accuracy than existing methods.

Original languageEnglish
Title of host publicationArtificial Intelligence and Robotics - 8th International Symposium, ISAIR 2023, Revised Selected Papers
EditorsHuimin Lu, Jintong Cai
PublisherSpringer Science and Business Media Deutschland GmbH
Pages147-164
Number of pages18
ISBN (Print)9789819991082
DOIs
StatePublished - 2024
Event8th International Symposium on Artificial Intelligence and Robotics, ISAIR 2023 - Beijing, China
Duration: 21 Oct 202323 Oct 2023

Publication series

NameCommunications in Computer and Information Science
Volume1998
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference8th International Symposium on Artificial Intelligence and Robotics, ISAIR 2023
Country/TerritoryChina
CityBeijing
Period21/10/2323/10/23

Keywords

  • 3-D hand pose estimation
  • Computer vision
  • Depth image
  • Hand pose estimation

Fingerprint

Dive into the research topics of 'Multi-virtual View Scoring Network for 3D Hand Pose Estimation from a Single Depth Image'. Together they form a unique fingerprint.

Cite this