Abstract
Object detection in drone aerial imagery faces critical challenges including extreme scale variance, clustered small objects, and complex backgrounds, leading to notable performance gaps in general detectors. The most effective solution is to increase the input resolution, but this substantially increases computational load. Existing methods are unable to achieve a satisfactory balance between accuracy and speed due to architectural inadequacies in preserving fine-grained features essential for small objects. Thus, we present an optimized model architecture based on the RT-DETR framework. By proposing the Bipartite Attentive Processing Block, which employs a channel-splitting strategy that allows parallel convolution and attention refinement, we improve the model’s ability to extract discriminative features from complex aerial images. A novel dual-fusion encoder with a Frequency-Aware Fusion Module further improves the model’s performance by retaining critical low-level features while effectively merging them with high-level semantic information. Additionally, we optimize the loss function by combining the Reciprocal Normalized Wasserstein Distance with CIoU. Extensive experiments on the VisDrone, UAVDT and AI-TOD datasets demonstrate the efficiency and effectiveness of our method. In particular, our method achieves a 6.9% higher AP than the baseline, requires 17.5% less computational load and provides superior accuracy compared to state-of-the-art methods.
| Original language | English |
|---|---|
| Article number | 104565 |
| Journal | Computer Vision and Image Understanding |
| Volume | 262 |
| DOIs | |
| State | Published - Dec 2025 |
Keywords
- Channel-splitting
- Clustered small objects
- Drone object detection
- Feature fusion
Fingerprint
Dive into the research topics of 'BAP-DETR: Efficient drone object detection network based on bipartite attentive processing and dual fusion encoder'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver