TY - JOUR
T1 - Learning Composite Latent Structures for 3D Human Action Representation and Recognition
AU - Wei, Ping
AU - Sun, Hongbin
AU - Zheng, Nanning
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2019/9
Y1 - 2019/9
N2 - 3D human action representation and recognition are important issues in many multimedia applications. While latent state approaches have been widely used for action modeling, previous works assume the latent states of actions are single attribute. This assumption is inaccurate for representing structures of complex actions. In this paper, we propose that latent states have composite attributes and introduce a novel composite latent structure (CLS) model to represent and recognize 3D human actions with skeleton sequences. A human action is modeled with a hierarchical graph, which represents the action sequence as sequential atomic actions. An atomic action is represented as a composite latent state, which is composed of a latent semantic attribute and a latent geometric attribute. A discriminative EM-like algorithm is proposed to learn the model parameters and the composite latent structures of human actions. Given a 3D skeleton sequence, a composite attribute iterative programming algorithm is proposed to recognize the action and infer the action's latent temporal structure. We evaluate the proposed method on three challenging 3D action datasets-MSR 3D Action Dataset, Multiview 3D Event Dataset, and UTKinect-Action 3D Dataset. Extensive experimental results on these datasets demonstrate the effectiveness and advantage of the proposed method.
AB - 3D human action representation and recognition are important issues in many multimedia applications. While latent state approaches have been widely used for action modeling, previous works assume the latent states of actions are single attribute. This assumption is inaccurate for representing structures of complex actions. In this paper, we propose that latent states have composite attributes and introduce a novel composite latent structure (CLS) model to represent and recognize 3D human actions with skeleton sequences. A human action is modeled with a hierarchical graph, which represents the action sequence as sequential atomic actions. An atomic action is represented as a composite latent state, which is composed of a latent semantic attribute and a latent geometric attribute. A discriminative EM-like algorithm is proposed to learn the model parameters and the composite latent structures of human actions. Given a 3D skeleton sequence, a composite attribute iterative programming algorithm is proposed to recognize the action and infer the action's latent temporal structure. We evaluate the proposed method on three challenging 3D action datasets-MSR 3D Action Dataset, Multiview 3D Event Dataset, and UTKinect-Action 3D Dataset. Extensive experimental results on these datasets demonstrate the effectiveness and advantage of the proposed method.
KW - 3D human action
KW - action recognition
KW - action representation
KW - composite latent structure
UR - https://www.scopus.com/pages/publications/85071538243
U2 - 10.1109/TMM.2019.2897902
DO - 10.1109/TMM.2019.2897902
M3 - 文章
AN - SCOPUS:85071538243
SN - 1520-9210
VL - 21
SP - 2195
EP - 2208
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
IS - 9
M1 - 8636161
ER -