TY - GEN
T1 - Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor
AU - Zhu, Guangyu
AU - Yang, Ming
AU - Yu, Kai
AU - Xu, Wei
AU - Gong, Yihong
PY - 2009
Y1 - 2009
N2 - Event detection plays an essential role in video content analysis and remains a challenging open problem. In particular, the study on detecting human-related video events in complex scenes with both a crowd of people and dynamic motion is still limited. In this paper, we investigate detecting video events that involve elementary human actions, e.g. making cellphone call, putting an object down, and pointing to something, in complex scenes using a novel spatio-temporal descriptor based approach. A new spatio-temporal descriptor, which temporally integrates the statistics of a set of response maps of low-level features, e.g. image gradients and optical flows, in a space-time cube, is proposed to capture the characteristics of actions in terms of their appearance and motion patterns. Based on this kind of descriptors, the bag-of-words method is utilized to describe a human figure as a concise feature vector. Then, these features are employed to train SVM classifiers at multiple spatial pyramid levels to distinguish different actions. Finally, a Gaussian kernel based temporal filtering is conducted to segment the sequences of events from a video stream taking account of the temporal consistency of actions. The proposed approach is capable of tolerating spatial layout variations and local deformations of human actions due to diverse view angles and rough human figure alignment in complex scenes. Extensive experiments on the 50-hour video dataset of TRECVid 2008 event detection task demonstrate that our approach outperforms the well-known SIFT descriptor based methods and effectively detects video events in challenging real-world conditions.
AB - Event detection plays an essential role in video content analysis and remains a challenging open problem. In particular, the study on detecting human-related video events in complex scenes with both a crowd of people and dynamic motion is still limited. In this paper, we investigate detecting video events that involve elementary human actions, e.g. making cellphone call, putting an object down, and pointing to something, in complex scenes using a novel spatio-temporal descriptor based approach. A new spatio-temporal descriptor, which temporally integrates the statistics of a set of response maps of low-level features, e.g. image gradients and optical flows, in a space-time cube, is proposed to capture the characteristics of actions in terms of their appearance and motion patterns. Based on this kind of descriptors, the bag-of-words method is utilized to describe a human figure as a concise feature vector. Then, these features are employed to train SVM classifiers at multiple spatial pyramid levels to distinguish different actions. Finally, a Gaussian kernel based temporal filtering is conducted to segment the sequences of events from a video stream taking account of the temporal consistency of actions. The proposed approach is capable of tolerating spatial layout variations and local deformations of human actions due to diverse view angles and rough human figure alignment in complex scenes. Extensive experiments on the 50-hour video dataset of TRECVid 2008 event detection task demonstrate that our approach outperforms the well-known SIFT descriptor based methods and effectively detects video events in challenging real-world conditions.
KW - Action recognition
KW - Event detection
KW - Motion representation
KW - Semantic analysis
UR - https://www.scopus.com/pages/publications/72449171990
U2 - 10.1145/1631272.1631297
DO - 10.1145/1631272.1631297
M3 - 会议稿件
AN - SCOPUS:72449171990
SN - 9781605586083
T3 - MM'09 - Proceedings of the 2009 ACM Multimedia Conference, with Co-located Workshops and Symposiums
SP - 165
EP - 174
BT - MM'09 - Proceedings of the 2009 ACM Multimedia Conference, with Co-located Workshops and Symposiums
T2 - 17th ACM International Conference on Multimedia, MM'09, with Co-located Workshops and Symposiums
Y2 - 19 October 2009 through 24 October 2009
ER -