TY - JOUR
T1 - Disentangled Noisy Correspondence Learning
AU - Dang, Zhuohang
AU - Luo, Minnan
AU - Wang, Jihong
AU - Jia, Chengyou
AU - Han, Haochen
AU - Wan, Herun
AU - Dai, Guang
AU - Chang, Xiaojun
AU - Wang, Jingdong
N1 - Publisher Copyright:
© 1992-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Cross-modal retrieval is crucial in understanding latent correspondences across modalities. However, existing methods implicitly assume well-matched training data, which is impractical as real-world data inevitably involves imperfect alignments, i.e., noisy correspondences. Although some works explore similarity-based strategies to address such noise, they suffer from sub-optimal similarity predictions influenced by modality-exclusive information (MEI), e.g., background noise in images and abstract definitions in texts. This issue arises as MEI is not shared across modalities, thus aligning it in training can markedly mislead similarity predictions. Moreover, although intuitive, directly applying previous cross-modal disentanglement methods suffers from limited noise tolerance and disentanglement efficacy. Inspired by the robustness of information bottlenecks against noise, we introduce DisNCL, a novel information-theoretic framework for feature Disentanglement in Noisy Correspondence Learning, to adaptively balance the extraction of modality-invariant information (MII) and MEI with certifiable optimal cross-modal disentanglement efficacy. DisNCL then enhances similarity predictions in modality-invariant subspace, thereby greatly boosting similarity-based alleviation strategy for noisy correspondences. Furthermore, DisNCL introduces soft matching targets to model noisy many-to-many relationships inherent in multi-modal inputs for noise-robust and accurate cross-modal alignment. Extensive experiments confirm DisNCL’s efficacy by 2% average recall improvement. Mutual information estimation and visualization results show that DisNCL learns meaningful MII/MEI subspaces, validating our theoretical analyses.
AB - Cross-modal retrieval is crucial in understanding latent correspondences across modalities. However, existing methods implicitly assume well-matched training data, which is impractical as real-world data inevitably involves imperfect alignments, i.e., noisy correspondences. Although some works explore similarity-based strategies to address such noise, they suffer from sub-optimal similarity predictions influenced by modality-exclusive information (MEI), e.g., background noise in images and abstract definitions in texts. This issue arises as MEI is not shared across modalities, thus aligning it in training can markedly mislead similarity predictions. Moreover, although intuitive, directly applying previous cross-modal disentanglement methods suffers from limited noise tolerance and disentanglement efficacy. Inspired by the robustness of information bottlenecks against noise, we introduce DisNCL, a novel information-theoretic framework for feature Disentanglement in Noisy Correspondence Learning, to adaptively balance the extraction of modality-invariant information (MII) and MEI with certifiable optimal cross-modal disentanglement efficacy. DisNCL then enhances similarity predictions in modality-invariant subspace, thereby greatly boosting similarity-based alleviation strategy for noisy correspondences. Furthermore, DisNCL introduces soft matching targets to model noisy many-to-many relationships inherent in multi-modal inputs for noise-robust and accurate cross-modal alignment. Extensive experiments confirm DisNCL’s efficacy by 2% average recall improvement. Mutual information estimation and visualization results show that DisNCL learns meaningful MII/MEI subspaces, validating our theoretical analyses.
KW - Cross-modal retrieval
KW - disentangled representation learning
KW - information bottleneck
KW - noisy correspondence
UR - https://www.scopus.com/pages/publications/105003434903
U2 - 10.1109/TIP.2025.3559457
DO - 10.1109/TIP.2025.3559457
M3 - 文章
C2 - 40257891
AN - SCOPUS:105003434903
SN - 1057-7149
VL - 34
SP - 2602
EP - 2615
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
ER -