Temporal Deformable Transformer for Action Localization

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Temporal action localization (TAL) is a challenging task that has received significant attention in video understanding. Recently, Transformer-based models have demonstrated their effectiveness in capturing contextual information and achieved outstanding performance on various TAL benchmarks. However, these methods still face challenges in computational efficiency and contextual modeling rigidity. In this paper, we propose a method to address those problems in Transformer-based models. Our model introduces a temporal deformable Transformer module and the corresponding time normalization, enabling flexible aggregation of temporal context information in videos, leading to enhanced video representations. To demonstrate the effectiveness of the proposed method, we construct a Transformer-based anchor-free model with a simple prediction head, which yields superior performance on widely used benchmarks. Specifically, it achieves an average mAP of 67.4% on THUMOS14 and an average mAP of 36.8% on ActivityNet-v1.3.

Original languageEnglish
Title of host publicationArtificial Neural Networks and Machine Learning – ICANN 2023 - 32nd International Conference on Artificial Neural Networks, Proceedings
EditorsLazaros Iliadis, Antonios Papaleonidas, Plamen Angelov, Chrisina Jayne
PublisherSpringer Science and Business Media Deutschland GmbH
Pages563-575
Number of pages13
ISBN (Print)9783031442223
DOIs
StatePublished - 2023
Event32nd International Conference on Artificial Neural Networks, ICANN 2023 - Heraklion, Greece
Duration: 26 Sep 202329 Sep 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14259 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference32nd International Conference on Artificial Neural Networks, ICANN 2023
Country/TerritoryGreece
CityHeraklion
Period26/09/2329/09/23

Keywords

  • Deformable Attention
  • Temporal Action Localization
  • Transformer
  • Video Understanding

Fingerprint

Dive into the research topics of 'Temporal Deformable Transformer for Action Localization'. Together they form a unique fingerprint.

Cite this