TY - JOUR
T1 - Enhancing Efficient Global Understanding Network with CSWin Transformer for Urban Scene Images Segmentation
AU - Zhang, Jie
AU - Shao, Mingwen
AU - Qiao, Yuanjian
AU - Cao, Xiangyong
N1 - Publisher Copyright:
© 2008-2012 IEEE.
PY - 2023
Y1 - 2023
N2 - The global context is crucial to the semantic segmentation task of remote sensing (RS) urban scene imagery since objects have large size variations, high similarity, and mutual occlusion. However, the existing methods for extracting global context information have limitations when directly applied to very high-resolution RS images, mainly in high complexity of computation and memory consumption. To alleviate this limitation, we propose a novel Efficient Global Understanding semantic segmentation Network (EGUNet) to extract global context information efficiently for applicability to RS images. Specifically, EGUNet is a hybrid U-shaped architecture of convolutional neural networks (CNNs) and Transformer in which the encoder uses the CSWin Transformer to capture global semantic information, and the decoder uses the CNNs structure to recover local detail information. Thus, the proposed EGUNet has a powerful global extraction capability and local position information recovery capability. In addition, three effective modules are proposed to improve the segmentation accuracy to make EGUNet more applicable for urban scene image segmentation tasks. First, a feature adaptive fusion module is introduced in the decoder to improve the fusion of the deep semantics and the location detail features. Second, an adaptive atrous-spatial pyramid pooling is designed at the skip connections to enhance the multiscale understanding of high-level semantic context. Finally, we introduce a lightweight enhanced segmentation head to utilize the information from each decoder stage for segmentation. Extensive experimental results on ISPRS Vaihingen and Potsdam datasets demonstrate the exceptional segmentation accuracy of EGUNet, outperforming the state-of-the-art methods.
AB - The global context is crucial to the semantic segmentation task of remote sensing (RS) urban scene imagery since objects have large size variations, high similarity, and mutual occlusion. However, the existing methods for extracting global context information have limitations when directly applied to very high-resolution RS images, mainly in high complexity of computation and memory consumption. To alleviate this limitation, we propose a novel Efficient Global Understanding semantic segmentation Network (EGUNet) to extract global context information efficiently for applicability to RS images. Specifically, EGUNet is a hybrid U-shaped architecture of convolutional neural networks (CNNs) and Transformer in which the encoder uses the CSWin Transformer to capture global semantic information, and the decoder uses the CNNs structure to recover local detail information. Thus, the proposed EGUNet has a powerful global extraction capability and local position information recovery capability. In addition, three effective modules are proposed to improve the segmentation accuracy to make EGUNet more applicable for urban scene image segmentation tasks. First, a feature adaptive fusion module is introduced in the decoder to improve the fusion of the deep semantics and the location detail features. Second, an adaptive atrous-spatial pyramid pooling is designed at the skip connections to enhance the multiscale understanding of high-level semantic context. Finally, we introduce a lightweight enhanced segmentation head to utilize the information from each decoder stage for segmentation. Extensive experimental results on ISPRS Vaihingen and Potsdam datasets demonstrate the exceptional segmentation accuracy of EGUNet, outperforming the state-of-the-art methods.
KW - CSWin Transformer
KW - global information extraction
KW - remote sensing (RS) urban scene imagery
KW - semantic segmentation
UR - https://www.scopus.com/pages/publications/85178994942
U2 - 10.1109/JSTARS.2023.3328559
DO - 10.1109/JSTARS.2023.3328559
M3 - 文章
AN - SCOPUS:85178994942
SN - 1939-1404
VL - 16
SP - 10230
EP - 10245
JO - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
JF - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
ER -