Skip to main navigation Skip to search Skip to main content

Adaptive token selection for efficient detection transformer with dual teacher supervision

  • Muyao Yuan
  • , Weizhan Zhang
  • , Caixia Yan
  • , Tieliang Gong
  • , Yuanhong Zhang
  • , Jiangyong Ying
  • Xi'an Jiaotong University
  • Ltd.

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Recently DEtection TRansformer(DETR)-based models obtain remarkable performance in object detection and various foundational vision tasks. However, its performance is impeded by high computational demands since it exhibits quadratic scaling with the number of feature tokens. To mitigate redundant computations in some areas like the background, existing works propose static token selection methods, which choose a predefined portion of tokens to forward. However, it is intuitive that the complexity of inference for detection tasks varies depending on the input images. Static token selection methods rely on a fixed keeping ratio, causing performance degradation in complex scenes and inefficiency in simple scenes. To address this issue, we propose an Adaptive Token Selection method for DETR (ATS-DETR) that dynamically chooses the token keeping ratio based on the complexity of the input to retain the most salient tokens. To explicitly control the sparsity and improve the performance of ATS-DETR, we put forward a novel approach called Dual Teacher Supervision to train the ATS-DETR. Specifically, we utilize a weak teacher to assist the model in distinguishing input complexity and a strong teacher for enhancing overall model performance through feature distillation. We further introduce the Global Distillation to minish the disparities of the feature patterns extracted from ATS-DETR and the strong teacher model. Extensive experiments demonstrate that ATS-DETR attains better performance compared to Deformable DETR while achieving an 83% reduction of GFLOPs in the encoder, and outperforms all the static token selection methods.

Original languageEnglish
Article number112036
JournalKnowledge-Based Systems
Volume300
DOIs
StatePublished - 27 Sep 2024

Keywords

  • Adaptive inference
  • Detection transformer
  • Inference acceleration
  • Token selection

Fingerprint

Dive into the research topics of 'Adaptive token selection for efficient detection transformer with dual teacher supervision'. Together they form a unique fingerprint.

Cite this