Skip to main navigation Skip to search Skip to main content

Motion Guided Attention Learning for Self-Supervised 3D Human Action Recognition

  • Xi'an Jiaotong University

Research output: Contribution to journalArticlepeer-review

39 Scopus citations

Abstract

3D human action recognition has received increasing attention due to its potential application in video surveillance equipment. To guarantee satisfactory performance, previous studies are mainly based on supervised methods, which have to add a large amount of manual annotation costs. In addition, general deep networks for video sequences suffer from heavy computational costs, thus cannot satisfy the basic requirement of embedded systems. In this paper, a novel Motion Guided Attention Learning (MG-AL) framework is proposed, which formulates the action representation learning as a self-supervised motion attention prediction problem. Specifically, MG-AL is a lightweight network. A set of simple motion priors (e.g., intra-joint variance, inter-frame deviation, intra-joint variance, and cross-joint covariance), which minimizes additional parameters and computational overhead, is regarded as a supervisory signal to guide the attention generation. The encoder is trained via predicting multiple self-attention tasks to capture action-specific feature representations. Extensive evaluations are performed on three challenging benchmark datasets (NTU-RGB+D 60, NTU-RGB+D 120 and NW-UCLA). The proposed method achieves superior performance compared to state-of-the-art methods, while having a very low computational cost.

Original languageEnglish
Pages (from-to)8623-8634
Number of pages12
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume32
Issue number12
DOIs
StatePublished - 1 Dec 2022

Keywords

  • 3D human action recognition
  • motion attention
  • prior knowledge
  • self-supervised learning

Fingerprint

Dive into the research topics of 'Motion Guided Attention Learning for Self-Supervised 3D Human Action Recognition'. Together they form a unique fingerprint.

Cite this