跳到主要导航 跳到搜索 跳到主要内容

AugDETR: Improving Multi-scale Learning for Detection Transformer

  • Xi'an Jiaotong University

科研成果: 书/报告/会议事项章节会议稿件同行评审

6 引用 (Scopus)

摘要

Current end-to-end detectors typically exploit transformers to detect objects and show promising performance. Among them, Deformable DETR is a representative paradigm that effectively exploits multi-scale features. However, small local receptive fields and limited query-encoder interactions weaken multi-scale learning. In this paper, we analyze local feature enhancement and multi-level encoder exploitation for improved multi-scale learning and construct a novel detection transformer detector named Augmented DETR (AugDETR) to realize them. Specifically, AugDETR consists of two components: Hybrid Attention Encoder and Encoder-Mixing Cross-Attention. Hybrid Attention Encoder enlarges the receptive field of the deformable encoder and introduces global context features to enhance feature representation. Encoder-Mixing Cross-Attention adaptively leverages multi-level encoders based on query features for more discriminative object features and faster convergence. By combining AugDETR with DETR-based detectors such as DINO, AlignDETR, DDQ, our models achieve performance improvements of 1.2, 1.1, and 1.0 AP in the COCO under the ResNet-50-4scale and 12 epochs setting, respectively.

源语言英语
主期刊名Computer Vision – ECCV 2024 - 18th European Conference, Proceedings
编辑Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol
出版商Springer Science and Business Media Deutschland GmbH
238-255
页数18
ISBN(印刷版)9783031726903
DOI
出版状态已出版 - 2025
活动18th European Conference on Computer Vision, ECCV 2024 - Milan, 意大利
期限: 29 9月 20244 10月 2024

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
15082 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议18th European Conference on Computer Vision, ECCV 2024
国家/地区意大利
Milan
时期29/09/244/10/24

学术指纹

探究 'AugDETR: Improving Multi-scale Learning for Detection Transformer' 的科研主题。它们共同构成独一无二的指纹。

引用此