跳到主要导航 跳到搜索 跳到主要内容

Revisiting Unsupervised Temporal Action Localization: The Primacy of High-Quality Actionness and Pseudolabels

  • Han Jiang
  • , Haoyu Tang
  • , Ming Yan
  • , Ji Zhang
  • , Mingzhu Xu
  • , Yupeng Hu
  • , Jihua Zhu
  • , Liqiang Nie
  • Shandong University
  • Xi'an Jiaotong University
  • Alibaba Group Holding Ltd.
  • Harbin Institute of Technology Shenzhen

科研成果: 书/报告/会议事项章节会议稿件同行评审

6 引用 (Scopus)

摘要

Recently, temporal action localization (TAL) methods, especially the weakly-supervised and unsupervised ones, have become a hot research topic. Existing unsupervised methods follow an iterative ''clustering and training'' strategy with diverse model designs during training stage, while they often overlook maintaining consistency between these stages, which is crucial: more accurate clustering results can reduce the noises of pseudolabels and thus enhance model training, while more robust training can in turn enrich clustering feature representation. We identify two critical challenges in unsupervised scenarios: 1. What features should the model generate for clustering? 2. Which pseudolabeled instances from clustering should be chosen for model training? After extensive explorations, we proposed a novel yet simple framework called Consistency-Oriented Progressive high actionness Learning to address these issues. For feature generation, our framework adopts a High Actionness snippet Selection (HAS) module to generate more discriminative global video features for clustering from the enhanced actionness features obtained from a designed Inner-Outer Consistency Network (IOCNet). For pseudolabel selection, we introduces a Progressive Learning With Representative Instances (PLRI) strategy to identify the most reliable and informative instances within each cluster for model training. These three modules, HAS, IOCNet, and PLRI, synergistically improve consistency in model training and clustering performance. Extensive experiments on THUMOS'14 and ActivityNet v1.2 datasets under both unsupervised and weakly-supervised settings demonstrate that our framework achieves the state-of-the-art results.

源语言英语
主期刊名MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia
出版商Association for Computing Machinery, Inc
5643-5652
页数10
ISBN(电子版)9798400706868
DOI
出版状态已出版 - 28 10月 2024
活动32nd ACM International Conference on Multimedia, MM 2024 - Melbourne, 澳大利亚
期限: 28 10月 20241 11月 2024

出版系列

姓名MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia

会议

会议32nd ACM International Conference on Multimedia, MM 2024
国家/地区澳大利亚
Melbourne
时期28/10/241/11/24

学术指纹

探究 'Revisiting Unsupervised Temporal Action Localization: The Primacy of High-Quality Actionness and Pseudolabels' 的科研主题。它们共同构成独一无二的指纹。

引用此