Skip to main navigation Skip to search Skip to main content

Beyond Single-Modal Boundary: Cross-Modal Anomaly Detection through Visual Prototype and Harmonization

  • Kai Mao
  • , Ping Wei
  • , Yiyang Lian
  • , Yangyang Wang
  • , Nanning Zheng
  • Xi'an Jiaotong University

Research output: Contribution to journalConference articlepeer-review

6 Scopus citations

Abstract

Anomaly detection is a significant task for its application and research value. While existing methods have made impressive progress within the same modality, cross-modal anomaly detection remains an open and challenging problem. In this paper, we propose a cross-modal anomaly detection model that is trained using data from a variety of existing modalities and can be generalized well to unseen modalities. The model consists of three major components: 1) the Transferable Visual Prototype directly learns normal/abnormal semantics in visual space; 2) the Prototype Harmonization strategy adaptively utilizes the Transferable Visual Prototypes from various modalities for inference on the unknown modality; 3) the Visual Discrepancy Inference under the few-shot setting enhances performance. In the zero-shot setting, the proposed method achieves AUROC improvements of 4.1%, 6.1%, 7.6%, and 6.8% over the best competing methods in the RGB, 3D, MRI/CT, and Thermal modalities, respectively. In the few-shot setting, our model also achieves the highest AUROC/AP on ten datasets in four modalities, substantially outperforming existing methods. Codes are available at https://github.com/Kerio99/CMAD.

Original languageEnglish
Pages (from-to)9964-9973
Number of pages10
JournalProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
DOIs
StatePublished - 2025
Event2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025 - Nashville, United States
Duration: 11 Jun 202515 Jun 2025

Fingerprint

Dive into the research topics of 'Beyond Single-Modal Boundary: Cross-Modal Anomaly Detection through Visual Prototype and Harmonization'. Together they form a unique fingerprint.

Cite this