Graph-based temporal action co-localization from an untrimmed video

Research output: Contribution to journalArticlepeer-review

12 Scopus citations

Abstract

We present an efficient approach for temporal action co-localization (TACL), which means to simultaneously localize all action instances in an untrimmed video. Compared with the conventional instance-by-instance action localization, TACL can exploit the contextual and temporal relationships among action instances to reduce the localization ambiguities. Motivated by the strong relational modeling capability of graph neural networks, we propose a Graph-based Temporal Action Co-Localization (G-TACL) method. By considering each action proposal as a node, G-TACL effectively aggregates contextual and temporal features from related action proposals to jointly recognize and localize all action instances in a single shot. Moreover, we introduce a novel multi-level consistency evaluator to measure the relatedness between any two action proposals. This is achieved by considering their high-level contextual similarities, low-level temporal coincidences and feature correlations. We exploit the Gated Recurrent Units (GRUs) to iteratively update the features of each node, which are then used to regress the temporal boundaries of action proposals and finally achieve action co-localization. Experimental results on three datasets, i.e., THUMOS14, MEXaction2 and ActivityNet v1.3 datasets demonstrate that our G-TACL is superior or comparable to the state-of-the-arts.

Original languageEnglish
Pages (from-to)211-223
Number of pages13
JournalNeurocomputing
Volume434
DOIs
StatePublished - 28 Apr 2021

Keywords

  • Multi-level consistency evaluator
  • Temporal action co-localization

Fingerprint

Dive into the research topics of 'Graph-based temporal action co-localization from an untrimmed video'. Together they form a unique fingerprint.

Cite this