TY - JOUR
T1 - UniSTAD
T2 - An Unified Triple-Tower Student–Teacher Model for Multi-Class Anomaly Detection and Localization
AU - Liu, Huan
AU - Sun, Jian
N1 - Publisher Copyright:
© 1991-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Despite the rapid advancements in the unsupervised anomaly detection and localization, most existing methods require to train different models for different categories, leading to increased computational and memory demands for real application with the number of classes grows. A more practical task is to detect anomalies from different categories using one unified model. However, this unified setting is challenging for modeling the multi-class normal feature representation due to the diversity of data categories, and the existing methods often drop in performance under this setting. In this work, we propose UniSTAD, a novel and effective unified method for multi-class anomaly detection and localization, using a transformer-based triple-tower students-teacher model. The triple-tower design contains global and local student models, respectively predicting features from global and local context features. UniSTAD learns the feature representation of normal data by joint distilling features to pre-trained teacher model, and enforcing the global/local context-based feature reconstruction and consistency. In the inference stage, UniSTAD identifies anomalous regions where expected feature consistencies are broken. Additionally, we integrate an untrained, category-agnostic localization refinement module, further improving multi-class anomaly detection and localization performance. Evaluated on real-world industrial datasets, UniSTAD demonstrates the state-of-the-art performance, validating its efficacy for multi-class anomaly detection and localization.
AB - Despite the rapid advancements in the unsupervised anomaly detection and localization, most existing methods require to train different models for different categories, leading to increased computational and memory demands for real application with the number of classes grows. A more practical task is to detect anomalies from different categories using one unified model. However, this unified setting is challenging for modeling the multi-class normal feature representation due to the diversity of data categories, and the existing methods often drop in performance under this setting. In this work, we propose UniSTAD, a novel and effective unified method for multi-class anomaly detection and localization, using a transformer-based triple-tower students-teacher model. The triple-tower design contains global and local student models, respectively predicting features from global and local context features. UniSTAD learns the feature representation of normal data by joint distilling features to pre-trained teacher model, and enforcing the global/local context-based feature reconstruction and consistency. In the inference stage, UniSTAD identifies anomalous regions where expected feature consistencies are broken. Additionally, we integrate an untrained, category-agnostic localization refinement module, further improving multi-class anomaly detection and localization performance. Evaluated on real-world industrial datasets, UniSTAD demonstrates the state-of-the-art performance, validating its efficacy for multi-class anomaly detection and localization.
KW - knowledge distillation
KW - self-supervised learning
KW - Unified anomaly detection
KW - vision transformer
UR - https://www.scopus.com/pages/publications/105002324723
U2 - 10.1109/TCSVT.2024.3507097
DO - 10.1109/TCSVT.2024.3507097
M3 - 文章
AN - SCOPUS:105002324723
SN - 1051-8215
VL - 35
SP - 3196
EP - 3208
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
IS - 4
ER -