Temporal Hierarchical Dictionary Guided Decoding for Online Gesture Segmentation and Recognition

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

Online segmentation and recognition of skeleton- based gestures are challenging. Compared with offline cases, the inference of online settings can only rely on the current few frames and always completes before whole temporal movements are performed. However, incompletely performed gestures are ambiguous and their early recognition is easy to fall into local optimum. In this work, we address the problem with a temporal hierarchical dictionary to guide the hidden Markov model (HMM) decoding procedure. The intuition is that, gestures are ambiguous with high uncertainty at early performing phases, and only become discriminate after certain phases. This uncertainty naturally can be measured by entropy. Thus, we propose a measurement called 'relative entropy map' (REM) to encode this temporal context to guide HMM decoding. Furthermore, we introduce a progressive learning strategy with which neural networks could learn a robust recognition of HMM states in an iterative manner. The performance of our method is intensively evaluated on three challenging databases and achieves state-of-the-art results. Our method shows the abilities of both extracting the discriminate connotations and reducing large redundancy in the HMM transition process. It is verified that our framework can achieve online recognition of continuous gesture streams even when they are halfway performed.

Original languageEnglish
Article number9224159
Pages (from-to)9689-9702
Number of pages14
JournalIEEE Transactions on Image Processing
Volume29
DOIs
StatePublished - 2020

Keywords

  • Temporal context
  • deep neural network
  • hidden Markov model
  • hierarchical structure
  • relative entropy
  • skeleton-based recognition

Fingerprint

Dive into the research topics of 'Temporal Hierarchical Dictionary Guided Decoding for Online Gesture Segmentation and Recognition'. Together they form a unique fingerprint.

Cite this