Action Coherence Network for Weakly Supervised Temporal Action Localization

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Scopus citations

Abstract

Most prominent temporal action localization methods are of the fully-supervised type, which rely heavily on frame-level labels, which could be prohibitively expensive to annotate. Thanks to recent developments on the Weakly-supervised Temporal Action Localization (W-TAL), this alternative paradigm requires only video-level labels in training, alleviating such annotation efforts. Specifically, we present Action Coherence Network (ACN) for W-TAL, which features a new coherence loss that better supervises action boundary learning and facilitate proposal regression. In addition, a purpose-built fusion module is proposed for localization inference based on features extracted by two streams of convolutional neural network. Overall, the proposed ACN achieves state-of-the-art W-TAL performance on two challenging datasets (THU-MOS14 and ActivityNet1.2, particularly ACN attains mAP of 24.2% on THUMOS14 under IoU threshold 0.5), which is approaching some recent fully-supervised TAL methods.

Original languageEnglish
Title of host publication2019 IEEE International Conference on Image Processing, ICIP 2019 - Proceedings
PublisherIEEE Computer Society
Pages3696-3700
Number of pages5
ISBN (Electronic)9781538662496
DOIs
StatePublished - Sep 2019
Event26th IEEE International Conference on Image Processing, ICIP 2019 - Taipei, Taiwan, Province of China
Duration: 22 Sep 201925 Sep 2019

Publication series

NameProceedings - International Conference on Image Processing, ICIP
Volume2019-September
ISSN (Print)1522-4880

Conference

Conference26th IEEE International Conference on Image Processing, ICIP 2019
Country/TerritoryTaiwan, Province of China
CityTaipei
Period22/09/1925/09/19

Keywords

  • coherence loss
  • temporal action lo-calization
  • weakly-supervised

Fingerprint

Dive into the research topics of 'Action Coherence Network for Weakly Supervised Temporal Action Localization'. Together they form a unique fingerprint.

Cite this