跳到主要导航 跳到搜索 跳到主要内容

Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension

  • Yaxian Wang
  • , Henghui Ding
  • , Shuting He
  • , Xudong Jiang
  • , Bifan Wei
  • , Jun Liu
  • Xi'an Jiaotong University
  • Fudan University
  • Shanghai University of Finance and Economics
  • Nanyang Technological University

科研成果: 书/报告/会议事项章节会议稿件同行评审

3 引用 (Scopus)

摘要

In this work, we address the challenging task of Generalized Referring Expression Comprehension (GREC). Compared to the classic Referring Expression Comprehension (REC) that focuses on single-target expressions, GREC extends the scope to a more practical setting by further encompassing no-target and multi-target expressions. Existing REC methods face challenges in handling the complex cases encountered in GREC, primarily due to their fixed output and limitations in multi-modal representations. To address these issues, we propose a Hierarchical Alignment-enhanced Adaptive Grounding Network (HieA2G) for GREC, which can flexibly deal with various types of referring expressions. First, a Hierarchical Multi-modal Semantic Alignment (HMSA) module is proposed to incorporate three levels of alignments, including word-object, phrase-object, and text-image alignment. It enables hierarchical cross-modal interactions across multiple levels to achieve comprehensive and robust multi-modal understanding, greatly enhancing grounding ability for complex cases. Then, to address the varying number of target objects in GREC, we introduce an Adaptive Grounding Counter (AGC) to dynamically determine the number of output targets. Additionally, an auxiliary contrastive loss is employed in AGC to enhance object-counting ability by pulling in multi-modal features with the same counting and pushing away those with different counting. Extensive experimental results show that HieA2G achieves new state-of-the-art performance on the challenging GREC task and also the other 4 tasks, including REC, Phrase Grounding, Referring Expression Segmentation (RES), and Generalized Referring Expression Segmentation (GRES), demonstrating the remarkable superiority and generalizability of the proposed HieA2G.

源语言英语
主期刊名Special Track on AI Alignment
编辑Toby Walsh, Julie Shah, Zico Kolter
出版商Association for the Advancement of Artificial Intelligence
8042-8050
页数9
版本8
ISBN(电子版)157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978
DOI
出版状态已出版 - 11 4月 2025
活动39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025 - Philadelphia, 美国
期限: 25 2月 20254 3月 2025

出版系列

姓名Proceedings of the AAAI Conference on Artificial Intelligence
编号8
39
ISSN(印刷版)2159-5399
ISSN(电子版)2374-3468

会议

会议39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025
国家/地区美国
Philadelphia
时期25/02/254/03/25

学术指纹

探究 'Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension' 的科研主题。它们共同构成独一无二的指纹。

引用此