Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

72 Scopus citations

Abstract

DETR-like methods have significantly increased detection performance in an end-to-end manner. The main-stream two-stage frameworks of them perform dense self-attention and select a fraction of queries for sparse cross-attention, which is proven effective for improving performance but also introduces a heavy computational burden and high dependence on stable query selection. This paper demonstrates that suboptimal two-stage selection strategies result in scale bias and redundancy due to the mismatch between selected queries and objects in two-stage initial-ization. To address these issues, we propose hierarchical salience filtering refinement, which performs transformer encoding only on filtered discriminative queries, for a bet-ter trade-off between computational efficiency and precision. The filtering process overcomes scale bias through a novel scale-independent salience supervision. To com-pensate for the semantic misalignment among queries, we introduce elaborate query refinement modules for stable two-stage initialization. Based on above improvements, the proposed Salience DETR achieves significant improvements of +4.0% AP, +0.2% AP, +4.4% AP on three challenging task-specific detection datasets, as well as 49.2% AP on COCO 2017 with less FLOPs. The code is available at https://github.com/xiuqhou/Salience-DETR.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
PublisherIEEE Computer Society
Pages17574-17583
Number of pages10
ISBN (Electronic)9798350353006
ISBN (Print)9798350353006
DOIs
StatePublished - 2024
Event2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, United States
Duration: 16 Jun 202422 Jun 2024

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)1063-6919

Conference

Conference2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
Country/TerritoryUnited States
CitySeattle
Period16/06/2422/06/24

Keywords

  • Detection transformer
  • Object detection
  • Query refinement
  • Query salience
  • Self-attention

Fingerprint

Dive into the research topics of 'Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement'. Together they form a unique fingerprint.

Cite this