Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

44 Scopus citations

Abstract

Event detection plays an essential role in video content analysis and remains a challenging open problem. In particular, the study on detecting human-related video events in complex scenes with both a crowd of people and dynamic motion is still limited. In this paper, we investigate detecting video events that involve elementary human actions, e.g. making cellphone call, putting an object down, and pointing to something, in complex scenes using a novel spatio-temporal descriptor based approach. A new spatio-temporal descriptor, which temporally integrates the statistics of a set of response maps of low-level features, e.g. image gradients and optical flows, in a space-time cube, is proposed to capture the characteristics of actions in terms of their appearance and motion patterns. Based on this kind of descriptors, the bag-of-words method is utilized to describe a human figure as a concise feature vector. Then, these features are employed to train SVM classifiers at multiple spatial pyramid levels to distinguish different actions. Finally, a Gaussian kernel based temporal filtering is conducted to segment the sequences of events from a video stream taking account of the temporal consistency of actions. The proposed approach is capable of tolerating spatial layout variations and local deformations of human actions due to diverse view angles and rough human figure alignment in complex scenes. Extensive experiments on the 50-hour video dataset of TRECVid 2008 event detection task demonstrate that our approach outperforms the well-known SIFT descriptor based methods and effectively detects video events in challenging real-world conditions.

Original languageEnglish
Title of host publicationMM'09 - Proceedings of the 2009 ACM Multimedia Conference, with Co-located Workshops and Symposiums
Pages165-174
Number of pages10
DOIs
StatePublished - 2009
Externally publishedYes
Event17th ACM International Conference on Multimedia, MM'09, with Co-located Workshops and Symposiums - Beijing, China
Duration: 19 Oct 200924 Oct 2009

Publication series

NameMM'09 - Proceedings of the 2009 ACM Multimedia Conference, with Co-located Workshops and Symposiums

Conference

Conference17th ACM International Conference on Multimedia, MM'09, with Co-located Workshops and Symposiums
Country/TerritoryChina
CityBeijing
Period19/10/0924/10/09

Keywords

  • Action recognition
  • Event detection
  • Motion representation
  • Semantic analysis

Fingerprint

Dive into the research topics of 'Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor'. Together they form a unique fingerprint.

Cite this