UniSTAD: An Unified Triple-Tower Student–Teacher Model for Multi-Class Anomaly Detection and Localization

Research output: Contribution to journalArticlepeer-review

Abstract

Despite the rapid advancements in the unsupervised anomaly detection and localization, most existing methods require to train different models for different categories, leading to increased computational and memory demands for real application with the number of classes grows. A more practical task is to detect anomalies from different categories using one unified model. However, this unified setting is challenging for modeling the multi-class normal feature representation due to the diversity of data categories, and the existing methods often drop in performance under this setting. In this work, we propose UniSTAD, a novel and effective unified method for multi-class anomaly detection and localization, using a transformer-based triple-tower students-teacher model. The triple-tower design contains global and local student models, respectively predicting features from global and local context features. UniSTAD learns the feature representation of normal data by joint distilling features to pre-trained teacher model, and enforcing the global/local context-based feature reconstruction and consistency. In the inference stage, UniSTAD identifies anomalous regions where expected feature consistencies are broken. Additionally, we integrate an untrained, category-agnostic localization refinement module, further improving multi-class anomaly detection and localization performance. Evaluated on real-world industrial datasets, UniSTAD demonstrates the state-of-the-art performance, validating its efficacy for multi-class anomaly detection and localization.

Original languageEnglish
Pages (from-to)3196-3208
Number of pages13
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume35
Issue number4
DOIs
StatePublished - 2025

Keywords

  • knowledge distillation
  • self-supervised learning
  • Unified anomaly detection
  • vision transformer

Fingerprint

Dive into the research topics of 'UniSTAD: An Unified Triple-Tower Student–Teacher Model for Multi-Class Anomaly Detection and Localization'. Together they form a unique fingerprint.

Cite this