跳到主要导航 跳到搜索 跳到主要内容

Disentangled Noisy Correspondence Learning

  • Zhuohang Dang
  • , Minnan Luo
  • , Jihong Wang
  • , Chengyou Jia
  • , Haochen Han
  • , Herun Wan
  • , Guang Dai
  • , Xiaojun Chang
  • , Jingdong Wang
  • Xi'an Jiaotong University
  • SGIT AI Lab
  • State Grid Corporation of China
  • University of Science and Technology of China
  • Mohamed Bin Zayed University of Artificial Intelligence
  • Baidu Inc

科研成果: 期刊稿件文章同行评审

5 引用 (Scopus)

摘要

Cross-modal retrieval is crucial in understanding latent correspondences across modalities. However, existing methods implicitly assume well-matched training data, which is impractical as real-world data inevitably involves imperfect alignments, i.e., noisy correspondences. Although some works explore similarity-based strategies to address such noise, they suffer from sub-optimal similarity predictions influenced by modality-exclusive information (MEI), e.g., background noise in images and abstract definitions in texts. This issue arises as MEI is not shared across modalities, thus aligning it in training can markedly mislead similarity predictions. Moreover, although intuitive, directly applying previous cross-modal disentanglement methods suffers from limited noise tolerance and disentanglement efficacy. Inspired by the robustness of information bottlenecks against noise, we introduce DisNCL, a novel information-theoretic framework for feature Disentanglement in Noisy Correspondence Learning, to adaptively balance the extraction of modality-invariant information (MII) and MEI with certifiable optimal cross-modal disentanglement efficacy. DisNCL then enhances similarity predictions in modality-invariant subspace, thereby greatly boosting similarity-based alleviation strategy for noisy correspondences. Furthermore, DisNCL introduces soft matching targets to model noisy many-to-many relationships inherent in multi-modal inputs for noise-robust and accurate cross-modal alignment. Extensive experiments confirm DisNCL’s efficacy by 2% average recall improvement. Mutual information estimation and visualization results show that DisNCL learns meaningful MII/MEI subspaces, validating our theoretical analyses.

源语言英语
页(从-至)2602-2615
页数14
期刊IEEE Transactions on Image Processing
34
DOI
出版状态已出版 - 2025

学术指纹

探究 'Disentangled Noisy Correspondence Learning' 的科研主题。它们共同构成独一无二的指纹。

引用此