Cross-Modal Semantic Alignment for Efficient Unsupervised Multimodal Anomaly Detection

  • Baoqiang Li
  • , Tengyu Zhang
  • , Zuo Zuo
  • , Zongze Wu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Unsupervised industrial anomaly detection aims to train a model capable of identifying diverse anomalous patterns by utilizing only normal samples. Numerous investigations have confirmed the effectiveness of such paradigms for surface anomaly detection using only 2D images. To better capture structural anomalies, recent studies have investigated Multimodal Unsupervised Industrial Anomaly Detection by jointly utilizing 2D images and 3D point cloud. Existing methods either ignore the complementarity between different modalities or consume a lot of storage space to learn and store normal features, making it difficult to balance efficient multi-modal feature utilization with computational efficiency. This paper proposes a novel and efficient unsupervised multimodal anomaly detection framework to fully exploit information from dual-modality data. By modeling latent semantic consistency of normal samples across modalities, the method detects cross-modal consistency deviations during testing for anomaly localization. Simultaneously, lightweight memory banks are separately constructed for each modality, capturing intra-modal feature inconsistencies to provide a complementary anomaly identification perspective parallel to cross-modal detection. Extensive experiments demonstrate that our framework, through simultaneously considering both inter and intra-modal consistency, achieves state-of-the-art (SOTA) detection and segmentation performance on the MVTec 3D-AD dataset with lower computational costs and faster inference speed, while maintaining robust advantages in few-shot setting.

Original languageEnglish
Title of host publicationECAI 2025 - 28th European Conference on Artificial Intelligence, including 14th Conference on Prestigious Applications of Intelligent Systems, PAIS 2025 - Proceedings
EditorsInes Lynce, Nello Murano, Mauro Vallati, Serena Villata, Federico Chesani, Michela Milano, Andrea Omicini, Mehdi Dastani
PublisherIOS Press BV
Pages1591-1598
Number of pages8
ISBN (Electronic)9781643686318
DOIs
StatePublished - 21 Oct 2025
Event28th European Conference on Artificial Intelligence, ECAI 2025, including 14th Conference on Prestigious Applications of Intelligent Systems, PAIS 2025 - Bologna, Italy
Duration: 25 Oct 202530 Oct 2025

Publication series

NameFrontiers in Artificial Intelligence and Applications
Volume413
ISSN (Print)0922-6389
ISSN (Electronic)1879-8314

Conference

Conference28th European Conference on Artificial Intelligence, ECAI 2025, including 14th Conference on Prestigious Applications of Intelligent Systems, PAIS 2025
Country/TerritoryItaly
CityBologna
Period25/10/2530/10/25

Fingerprint

Dive into the research topics of 'Cross-Modal Semantic Alignment for Efficient Unsupervised Multimodal Anomaly Detection'. Together they form a unique fingerprint.

Cite this