TY - JOUR
T1 - RGB-D Domain adaptive semantic segmentation with cross-modality feature recalibration
AU - Fan, Qizhe
AU - Shen, Xiaoqin
AU - Ying, Shihui
AU - Wang, Juan
AU - Du, Shaoyi
N1 - Publisher Copyright:
© 2025
PY - 2025/8
Y1 - 2025/8
N2 - Unsupervised domain adaptive (UDA) semantic segmentation aims to train models that effectively transfer knowledge from synthetic to real-world images, thereby reducing the reliance on manual annotation. Currently, most existing UDA methods primarily focus on RGB image processing, largely overlooking depth information as a valuable geometric cue that complements RGB representations. Additionally, while some approaches attempt to incorporate depth information by inferring it from RGB images as an auxiliary task, inaccuracies in depth estimation can still result in localized blurring or distortion in segmentation outcomes. To comprehensively address these limitations, we propose an innovative RGB-D UDA framework CMFRDA, which seamlessly integrates both RGB and depth images as inputs, fully leveraging their distinct yet complementary properties to improve segmentation performance. Specifically, to mitigate the prevalent object boundary noise in depth information, we propose a Depth Feature Rectification Module (DFRM), which effectively suppresses noise while enhancing the representation of fine structural details. Nevertheless, despite the effectiveness of DFRM, challenges remain due to the presence of noisy signals arising from incomplete surface data beyond the operational range of depth sensors, as well as potential mismatches between modalities. In order to overcome these challenges, we further introduce a Cross-Modality Feature Recalibration (CMFR) block. CMFR comprises two key components: Channel-wise Consistency Recalibration (CCR) and Spatial-wise Consistency Recalibration (SCR). CCR suppresses noise from incomplete surfaces in depth by leveraging the complementary information provided by RGB features, while SCR exploits the distinctive advantages of both modalities to mutually recalibrate each other, thereby ensuring consistency between RGB and depth modalities. By seamlessly integrating DFRM and CMFR, our CMFRDA framework effectively improves the performance of UDA semantic segmentation. Multitudinous experiments demonstrate that our CMFRDA achieves competitive performance on two widely-used UDA benchmarks GTA → Cityscapes and Synthia → Cityscapes.
AB - Unsupervised domain adaptive (UDA) semantic segmentation aims to train models that effectively transfer knowledge from synthetic to real-world images, thereby reducing the reliance on manual annotation. Currently, most existing UDA methods primarily focus on RGB image processing, largely overlooking depth information as a valuable geometric cue that complements RGB representations. Additionally, while some approaches attempt to incorporate depth information by inferring it from RGB images as an auxiliary task, inaccuracies in depth estimation can still result in localized blurring or distortion in segmentation outcomes. To comprehensively address these limitations, we propose an innovative RGB-D UDA framework CMFRDA, which seamlessly integrates both RGB and depth images as inputs, fully leveraging their distinct yet complementary properties to improve segmentation performance. Specifically, to mitigate the prevalent object boundary noise in depth information, we propose a Depth Feature Rectification Module (DFRM), which effectively suppresses noise while enhancing the representation of fine structural details. Nevertheless, despite the effectiveness of DFRM, challenges remain due to the presence of noisy signals arising from incomplete surface data beyond the operational range of depth sensors, as well as potential mismatches between modalities. In order to overcome these challenges, we further introduce a Cross-Modality Feature Recalibration (CMFR) block. CMFR comprises two key components: Channel-wise Consistency Recalibration (CCR) and Spatial-wise Consistency Recalibration (SCR). CCR suppresses noise from incomplete surfaces in depth by leveraging the complementary information provided by RGB features, while SCR exploits the distinctive advantages of both modalities to mutually recalibrate each other, thereby ensuring consistency between RGB and depth modalities. By seamlessly integrating DFRM and CMFR, our CMFRDA framework effectively improves the performance of UDA semantic segmentation. Multitudinous experiments demonstrate that our CMFRDA achieves competitive performance on two widely-used UDA benchmarks GTA → Cityscapes and Synthia → Cityscapes.
KW - Cross-modality feature recalibration
KW - RGB-D semantic segmentation
KW - Unsupervised domain adaptation
UR - https://www.scopus.com/pages/publications/105000849305
U2 - 10.1016/j.inffus.2025.103117
DO - 10.1016/j.inffus.2025.103117
M3 - 文章
AN - SCOPUS:105000849305
SN - 1566-2535
VL - 120
JO - Information Fusion
JF - Information Fusion
M1 - 103117
ER -