Dual relation network for temporal action localization

Research output: Contribution to journalArticlepeer-review

20 Scopus citations

Abstract

Temporal action localization is a challenging task for video understanding. Most previous methods process each proposal independently and neglect the reasoning of proposal-proposal and proposal-context relations. We argue that the supplementary information obtained by exploiting these relations can enhance the proposal representation and further boost the action localization. To this end, we propose a dual relation network to model both proposal-proposal and proposal-context relations. Concretely, a proposal-proposal relation module is leveraged to learn discriminative supplementary information from relevant proposals, which allows the network to model their interaction based on appearance and geometric similarities. Meanwhile, a proposal-context relation module is employed to mine contextual clues by adaptively learning from the global context outside of region-based proposals. They effectively leverage the inherent correlation between actions and the long-term dependency with videos for high-quality proposal refinement. As a result, the proposed framework enables the model to distinguish similar action instances and locate temporal boundaries more precisely. Extensive experiments on the THUMOS14 dataset and ActivityNet v1.3 dataset demonstrate that the proposed method significantly outperforms recent state-of-the-art methods.

Original languageEnglish
Article number108725
JournalPattern Recognition
Volume129
DOIs
StatePublished - Sep 2022

Keywords

  • Relation reasoning
  • Temporal action localization

Fingerprint

Dive into the research topics of 'Dual relation network for temporal action localization'. Together they form a unique fingerprint.

Cite this