Cross stage partial connections based weighted Bi-directional feature pyramid and enhanced spatial transformation network for robust object detection

  • Yan Feng Lu
  • , Qian Yu
  • , Jing Wen Gao
  • , Yi Li
  • , Jun Cheng Zou
  • , Hong Qiao

Research output: Contribution to journalArticlepeer-review

40 Scopus citations

Abstract

Structural information is an essential component for efficient object detection. In many visual detection tasks, the objects with large structural deformation usually make up a large proportion. The shape, contour, and internal structure of the objects tend toward dramatic change, which easily causes troubles for efficient object detection. Therefore, how to detect these objects robustly and accurately is one of the significant challenges. To address this issue, we introduce a Cross Stage Partial connections-based weighted Bi-directional Feature Pyramid Network (CSP-BiFPN), which allows easy and efficient multi-scale feature fusion by cross-stage partial connections. Second, to enhance the model's spatial transformation capacity, the multi-scale feature maps extracted from the YOLO backbone network are processed by an enhanced spatial transformation network (ESTN) for spatial deformations. Based on these architectural modifications and optimizations, we further develop a novel real-time robust object detection model called Bi-STN-YOLO. We evaluate the performance of the proposed method on four image datasets. The experimental results demonstrate that the proposed approach achieves significant improvements compared with the typical YOLO families and competitive performance compared to the state-of-the-arts in detection tasks.

Original languageEnglish
Pages (from-to)70-82
Number of pages13
JournalNeurocomputing
Volume513
DOIs
StatePublished - 7 Nov 2022
Externally publishedYes

Keywords

  • Image detection
  • Robust object detection
  • Spatial transformation
  • Structural deformation

Fingerprint

Dive into the research topics of 'Cross stage partial connections based weighted Bi-directional feature pyramid and enhanced spatial transformation network for robust object detection'. Together they form a unique fingerprint.

Cite this