Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization

  • Yuanhao Zhai
  • , Le Wang
  • , Wei Tang
  • , Qilin Zhang
  • , Junsong Yuan
  • , Gang Hua

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

129 Scopus citations

Abstract

Weakly-supervised Temporal Action Localization (W-TAL) aims to classify and localize all action instances in an untrimmed video under only video-level supervision. However, without frame-level annotations, it is challenging for W-TAL methods to identify false positive action proposals and generate action proposals with precise temporal boundaries. In this paper, we present a Two-Stream Consensus Network (TSCN) to simultaneously address these challenges. The proposed TSCN features an iterative refinement training method, where a frame-level pseudo ground truth is iteratively updated, and used to provide frame-level supervision for improved model training and false positive action proposal elimination. Furthermore, we propose a new attention normalization loss to encourage the predicted attention to act like a binary selection, and promote the precise localization of action instance boundaries. Experiments conducted on the THUMOS14 and ActivityNet datasets show that the proposed TSCN outperforms current state-of-the-art methods, and even achieves comparable results with some recent fully-supervised methods.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2020 - 16th European Conference, 2020, Proceedings
EditorsAndrea Vedaldi, Horst Bischof, Thomas Brox, Jan-Michael Frahm
PublisherSpringer Science and Business Media Deutschland GmbH
Pages37-54
Number of pages18
ISBN (Print)9783030585389
DOIs
StatePublished - 2020
Event16th European Conference on Computer Vision, ECCV 2020 - Glasgow, United Kingdom
Duration: 23 Aug 202028 Aug 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12351 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference16th European Conference on Computer Vision, ECCV 2020
Country/TerritoryUnited Kingdom
CityGlasgow
Period23/08/2028/08/20

Keywords

  • Temporal action localization
  • Weakly-supervised learning

Fingerprint

Dive into the research topics of 'Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization'. Together they form a unique fingerprint.

Cite this