Cross-Graph Transformer Network for Temporal Sentence Grounding

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Temporal sentence grounding aims to retrieve moments associated with the given sentences in untrimmed videos, which is a multi-modal problem and needs the adequate understanding of the sentence and video structure as well as the accurate interaction of the two modals. In this paper, we propose a cross-graph Transformer network (CGTN) model to address this problem, where the sentence is taken as a dependency tree and the video as a graph, according to their non-linear structures. Based on the graph structures, we design the self-graph attention and cross-graph attention to model the relationship between the nodes in the graph and cross the graphs. We test the proposed model on two challenging datasets. Extensive experiments demonstrate the strength of our method.

Original languageEnglish
Title of host publicationArtificial Neural Networks and Machine Learning – ICANN 2023 - 32nd International Conference on Artificial Neural Networks, Proceedings
EditorsLazaros Iliadis, Antonios Papaleonidas, Plamen Angelov, Chrisina Jayne
PublisherSpringer Science and Business Media Deutschland GmbH
Pages345-356
Number of pages12
ISBN (Print)9783031442223
DOIs
StatePublished - 2023
Event32nd International Conference on Artificial Neural Networks, ICANN 2023 - Heraklion, Greece
Duration: 26 Sep 202329 Sep 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14259 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference32nd International Conference on Artificial Neural Networks, ICANN 2023
Country/TerritoryGreece
CityHeraklion
Period26/09/2329/09/23

Keywords

  • Cross-modal
  • Graph attention
  • Temporal grounding

Fingerprint

Dive into the research topics of 'Cross-Graph Transformer Network for Temporal Sentence Grounding'. Together they form a unique fingerprint.

Cite this