TY - JOUR
T1 - Group Sampling for Scale Invariant Face Detection
AU - Ming, Xiang
AU - Wei, Fangyun
AU - Zhang, Ting
AU - Chen, Dong
AU - Zheng, Nanning
AU - Wen, Fang
N1 - Publisher Copyright:
© 1979-2012 IEEE.
PY - 2022/2/1
Y1 - 2022/2/1
N2 - Detectors based on deep learning tend to detect multi-scale objects on a single input image for efficiency. Recent works, such as FPN and SSD, generally use feature maps from multiple layers with different spatial resolutions to detect objects at different scales, e.g., high-resolution feature maps for small objects. However, we find that objects at all scales can also be well detected with features from a single layer of the network. In this paper, we carefully examine the factors affecting detection performance across a large range of scales, and conclude that the balance of training samples, including both positive and negative ones, at different scales is the key. We propose a group sampling method which divides the anchors into several groups according to the scale, and ensure that the number of samples for each group is the same during training. Our approach using only one single layer of FPN as features is able to advance the state-of-the-arts. Comprehensive analysis and extensive experiments have been conducted to show the effectiveness of the proposed method. Moreover, we show that our approach is favorably applicable to other tasks, such as object detection on COCO dataset, and to other detection pipelines, such as YOLOv3, SSD and R-FCN. Our approach, evaluated on face detection benchmarks including FDDB and WIDER FACE datasets, achieves state-of-the-art results without bells and whistles.
AB - Detectors based on deep learning tend to detect multi-scale objects on a single input image for efficiency. Recent works, such as FPN and SSD, generally use feature maps from multiple layers with different spatial resolutions to detect objects at different scales, e.g., high-resolution feature maps for small objects. However, we find that objects at all scales can also be well detected with features from a single layer of the network. In this paper, we carefully examine the factors affecting detection performance across a large range of scales, and conclude that the balance of training samples, including both positive and negative ones, at different scales is the key. We propose a group sampling method which divides the anchors into several groups according to the scale, and ensure that the number of samples for each group is the same during training. Our approach using only one single layer of FPN as features is able to advance the state-of-the-arts. Comprehensive analysis and extensive experiments have been conducted to show the effectiveness of the proposed method. Moreover, we show that our approach is favorably applicable to other tasks, such as object detection on COCO dataset, and to other detection pipelines, such as YOLOv3, SSD and R-FCN. Our approach, evaluated on face detection benchmarks including FDDB and WIDER FACE datasets, achieves state-of-the-art results without bells and whistles.
KW - Object detection
KW - convolution neural network
KW - sampling
UR - https://www.scopus.com/pages/publications/85122782558
U2 - 10.1109/TPAMI.2020.3012414
DO - 10.1109/TPAMI.2020.3012414
M3 - 文章
C2 - 32750835
AN - SCOPUS:85122782558
SN - 0162-8828
VL - 44
SP - 985
EP - 1001
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 2
ER -