Abstract
Anomaly detection is a significant task for its application and research value. While existing methods have made impressive progress within the same modality, cross-modal anomaly detection remains an open and challenging problem. In this paper, we propose a cross-modal anomaly detection model that is trained using data from a variety of existing modalities and can be generalized well to unseen modalities. The model consists of three major components: 1) the Transferable Visual Prototype directly learns normal/abnormal semantics in visual space; 2) the Prototype Harmonization strategy adaptively utilizes the Transferable Visual Prototypes from various modalities for inference on the unknown modality; 3) the Visual Discrepancy Inference under the few-shot setting enhances performance. In the zero-shot setting, the proposed method achieves AUROC improvements of 4.1%, 6.1%, 7.6%, and 6.8% over the best competing methods in the RGB, 3D, MRI/CT, and Thermal modalities, respectively. In the few-shot setting, our model also achieves the highest AUROC/AP on ten datasets in four modalities, substantially outperforming existing methods. Codes are available at https://github.com/Kerio99/CMAD.
| Original language | English |
|---|---|
| Pages (from-to) | 9964-9973 |
| Number of pages | 10 |
| Journal | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
| DOIs | |
| State | Published - 2025 |
| Event | 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025 - Nashville, United States Duration: 11 Jun 2025 → 15 Jun 2025 |
Fingerprint
Dive into the research topics of 'Beyond Single-Modal Boundary: Cross-Modal Anomaly Detection through Visual Prototype and Harmonization'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver