TY - JOUR
T1 - A Self-Paced Multiple Instance Learning Framework for Weakly Supervised Video Anomaly Detection
AU - He, Ping
AU - Li, Huibin
AU - Han, Miaolin
N1 - Publisher Copyright:
© 2025 by the authors.
PY - 2025/2
Y1 - 2025/2
N2 - Weakly supervised video anomaly detection (WS-VAD) is often addressed as a multi-instance learning problem in which a few fixed number of video segments are selected for classifier training. However, this kind of selection strategy usually leads to a biased classifier. To solve this problem, we propose a novel self-paced multiple-instance learning (SP-MIL) framework for WS-VAD. Given a pre-trained baseline model, the proposed SP-MIL can enhance its performance by adaptively selecting video segments (from easy to hard) and persistently updating the classifier. In particular, for each training epoch, the baseline classifier is firstly used to predict the anomaly score of each segment, and their pseudo-labels are generated. Then, for all segments in each video, their age parameter is estimated based on their loss values. Based on the age parameter, we can determine the self-paced learning weight (hard weight with values of 0 or 1) of each segment, which is used to select the subset of segments. Finally, the selected segments, along with their pseudo-labels, are used to update the classifier. Extensive experiments conducted on the UCF-Crime, ShanghaiTech, and XD-Violence datasets demonstrate the effectiveness of the proposed framework, outperforming state-of-the-art methods.
AB - Weakly supervised video anomaly detection (WS-VAD) is often addressed as a multi-instance learning problem in which a few fixed number of video segments are selected for classifier training. However, this kind of selection strategy usually leads to a biased classifier. To solve this problem, we propose a novel self-paced multiple-instance learning (SP-MIL) framework for WS-VAD. Given a pre-trained baseline model, the proposed SP-MIL can enhance its performance by adaptively selecting video segments (from easy to hard) and persistently updating the classifier. In particular, for each training epoch, the baseline classifier is firstly used to predict the anomaly score of each segment, and their pseudo-labels are generated. Then, for all segments in each video, their age parameter is estimated based on their loss values. Based on the age parameter, we can determine the self-paced learning weight (hard weight with values of 0 or 1) of each segment, which is used to select the subset of segments. Finally, the selected segments, along with their pseudo-labels, are used to update the classifier. Extensive experiments conducted on the UCF-Crime, ShanghaiTech, and XD-Violence datasets demonstrate the effectiveness of the proposed framework, outperforming state-of-the-art methods.
KW - multiple-instance learning
KW - self-paced learning
KW - unbiased prediction
KW - weakly supervised video anomaly detection
UR - https://www.scopus.com/pages/publications/85217567313
U2 - 10.3390/app15031049
DO - 10.3390/app15031049
M3 - 文章
AN - SCOPUS:85217567313
SN - 2076-3417
VL - 15
JO - Applied Sciences (Switzerland)
JF - Applied Sciences (Switzerland)
IS - 3
M1 - 1049
ER -