Skip to main navigation Skip to search Skip to main content

Learning Heterogeneous Spatial-Temporal Context for Skeleton-Based Action Recognition

  • Xi'an Jiaotong University
  • Tencent

Research output: Contribution to journalArticlepeer-review

26 Scopus citations

Abstract

Graph convolution networks (GCNs) have been widely used and achieved fruitful progress in the skeleton-based action recognition task. In GCNs, node interaction modeling dominates the context aggregation and, therefore, is crucial for a graph-based convolution kernel to extract representative features. In this article, we introduce a closer look at a powerful graph convolution formulation to capture rich movement patterns from these skeleton-based graphs. Specifically, we propose a novel heterogeneous graph convolution (HetGCN) that can be considered as the middle ground between the extremes of (2 + 1)-D and 3-D graph convolution. The core observation of HetGCN is that multiple information flows are jointly intertwined in a 3-D convolution kernel, including spatial, temporal, and spatial-temporal cues. Since spatial and temporal information flows characterize different cues for action recognition, HetGCN first dynamically analyzes pairwise interactions between each node and its cross-space-time neighbors and then encourages heterogeneous context aggregation among them. Considering the HetGCN as a generic convolution formulation, we further develop it into two specific instantiations (i.e., intra-scale and inter-scale HetGCN) that significantly facilitate cross-space-time and cross-scale learning on skeleton graphs. By integrating these modules, we propose a strong human action recognition system that outperforms state-of-the-art methods with the accuracy of 93.1% on NTU-60 cross-subject (X-Sub) benchmark, 88.9% on NTU-120 X-Sub benchmark, and 38.4% on kinetics skeleton.

Original languageEnglish
Pages (from-to)12130-12141
Number of pages12
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume35
Issue number9
DOIs
StatePublished - 2024

Keywords

  • Heterogeneous context learning
  • multiscale graph
  • skeleton-based action recognition
  • spatiala-temporal feature representation

Fingerprint

Dive into the research topics of 'Learning Heterogeneous Spatial-Temporal Context for Skeleton-Based Action Recognition'. Together they form a unique fingerprint.

Cite this