TY - GEN
T1 - Time-Dependent Body Gesture Representation for Video Emotion Recognition
AU - Wei, Jie
AU - Yang, Xinyu
AU - Dong, Yizhuo
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - Video emotion recognition has recently become a research hotspot in the field of affective computing. Although large parts of studies focus on facial cues, body gestures are the only available cues in some scenes such as video monitoring systems. In this paper, we propose a body gesture representation method based on body joint movements. To reduce the model complexity and promote the understanding of video emotion, this method uses body joint information to represent body gestures and captures time-dependent relationship of body joints. Furthermore, we propose an attention-based channelwise convolutional neural network (ACCNN) to retain the independent characteristics of each body joint and learn key body gesture features. Experimental results on the multimodal database of Emotional Speech, Video and Gestures (ESVG) demonstrate the effectiveness of the proposed method, and the accuracy of body gesture features is comparable with that of facial features.
AB - Video emotion recognition has recently become a research hotspot in the field of affective computing. Although large parts of studies focus on facial cues, body gestures are the only available cues in some scenes such as video monitoring systems. In this paper, we propose a body gesture representation method based on body joint movements. To reduce the model complexity and promote the understanding of video emotion, this method uses body joint information to represent body gestures and captures time-dependent relationship of body joints. Furthermore, we propose an attention-based channelwise convolutional neural network (ACCNN) to retain the independent characteristics of each body joint and learn key body gesture features. Experimental results on the multimodal database of Emotional Speech, Video and Gestures (ESVG) demonstrate the effectiveness of the proposed method, and the accuracy of body gesture features is comparable with that of facial features.
KW - Body joints
KW - Channelwise convolution
KW - Gesture representation
KW - Video emotion recognition
UR - https://www.scopus.com/pages/publications/85101769720
U2 - 10.1007/978-3-030-67832-6_33
DO - 10.1007/978-3-030-67832-6_33
M3 - 会议稿件
AN - SCOPUS:85101769720
SN - 9783030678319
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 403
EP - 416
BT - MultiMedia Modeling - 27th International Conference, MMM 2021, Proceedings
A2 - Lokoc, Jakub
A2 - Skopal, Tomáš
A2 - Schoeffmann, Klaus
A2 - Mezaris, Vasileios
A2 - Li, Xirong
A2 - Vrochidis, Stefanos
A2 - Patras, Ioannis
PB - Springer Science and Business Media Deutschland GmbH
T2 - 27th International Conference on MultiMedia Modeling, MMM 2021
Y2 - 22 June 2021 through 24 June 2021
ER -