TY - JOUR
T1 - Local to Global Feature Learning for Salient Object Detection
AU - Feng, Xuelu
AU - Zhou, Sanping
AU - Zhu, Zixin
AU - Wang, Le
AU - Hua, Gang
N1 - Publisher Copyright:
© 2022
PY - 2022/10
Y1 - 2022/10
N2 - Existing works mainly focus on how to aggregate multi-level features for salient object detection, which may generate sub-optimal results due to interference with redundant details. To handle this problem, we aim to learn a local to global feature representation, so as to segment the detailed structures in a local perspective and locate the salient objects in a global perspective. In particular, we design a novel L2GF network which mainly consists of three modules, i.e., L-Net, G-Net, and F-Net. L-Net employs our enhanced auto-encoder structure to extract local contexts that provide rich boundary information of objects, which is able to learn rich local features in a certain receptive field. G-Net feeds the tokenized feature patches as input sequence, and leverages the well-known Transformer structure to extract global contexts which are helpful to derive the relationship between multiple salient regions and produce more complete salient results. F-Net is a coarse-to-fine process, which takes the features and maps of both local and global branches as inputs and calculate the final high-quality salient map. Extensive experiments on five benchmark datasets demonstrate that our L2GF network performs favorably against the state-of-the-art approaches.
AB - Existing works mainly focus on how to aggregate multi-level features for salient object detection, which may generate sub-optimal results due to interference with redundant details. To handle this problem, we aim to learn a local to global feature representation, so as to segment the detailed structures in a local perspective and locate the salient objects in a global perspective. In particular, we design a novel L2GF network which mainly consists of three modules, i.e., L-Net, G-Net, and F-Net. L-Net employs our enhanced auto-encoder structure to extract local contexts that provide rich boundary information of objects, which is able to learn rich local features in a certain receptive field. G-Net feeds the tokenized feature patches as input sequence, and leverages the well-known Transformer structure to extract global contexts which are helpful to derive the relationship between multiple salient regions and produce more complete salient results. F-Net is a coarse-to-fine process, which takes the features and maps of both local and global branches as inputs and calculate the final high-quality salient map. Extensive experiments on five benchmark datasets demonstrate that our L2GF network performs favorably against the state-of-the-art approaches.
KW - Deep neural network
KW - Local to global feature representation
KW - Salient object detection
UR - https://www.scopus.com/pages/publications/85138451660
U2 - 10.1016/j.patrec.2022.09.004
DO - 10.1016/j.patrec.2022.09.004
M3 - 文章
AN - SCOPUS:85138451660
SN - 0167-8655
VL - 162
SP - 81
EP - 88
JO - Pattern Recognition Letters
JF - Pattern Recognition Letters
ER -