TY - JOUR
T1 - Multi-Instance Deep Learning Based on Attention Mechanism for Failure Prediction of Unlabeled Hard Disk Drives
AU - Wang, Guochao
AU - Wang, Yu
AU - Sun, Xiaojie
N1 - Publisher Copyright:
© 1963-2012 IEEE.
PY - 2021
Y1 - 2021
N2 - Failure of hard disk drives (HDDs) is the most critical reliability issue of data center. Therefore, predicting the failure of the HDD is an important means to ensure the storage security of the data center. However, most current research works had not paid attention to the fact that the self-monitoring, analysis and reporting technology (SMART) data in a returned failed HDD are a long-term sequence that consists of many unlabeled data, as the healthy and faulty data are highly mixed. Because the failure data in the rapid degradation period are less than the health data in the normal state, the mixture of healthy and faulty data results in an extremely data imbalance. This brings a great challenge to find the hidden fault information, and thus failure prediction becomes a difficult task. To cope with the above problems, a multi-instance long-term data classification method based on long short-term memory (LSTM) network and attention mechanism are proposed to predict the failure of HDDs. Regarding long time sequence HDD data as an instance bag, multi-instance learning (MIL) divides it into multiple instances in the subconcept layer, and then studies the connection between instances and bag labels. Based on the analysis of HDD data of a communication company and Backblaze data center, our proposed method can obtain much better results than other methods.
AB - Failure of hard disk drives (HDDs) is the most critical reliability issue of data center. Therefore, predicting the failure of the HDD is an important means to ensure the storage security of the data center. However, most current research works had not paid attention to the fact that the self-monitoring, analysis and reporting technology (SMART) data in a returned failed HDD are a long-term sequence that consists of many unlabeled data, as the healthy and faulty data are highly mixed. Because the failure data in the rapid degradation period are less than the health data in the normal state, the mixture of healthy and faulty data results in an extremely data imbalance. This brings a great challenge to find the hidden fault information, and thus failure prediction becomes a difficult task. To cope with the above problems, a multi-instance long-term data classification method based on long short-term memory (LSTM) network and attention mechanism are proposed to predict the failure of HDDs. Regarding long time sequence HDD data as an instance bag, multi-instance learning (MIL) divides it into multiple instances in the subconcept layer, and then studies the connection between instances and bag labels. Based on the analysis of HDD data of a communication company and Backblaze data center, our proposed method can obtain much better results than other methods.
KW - Attention
KW - data imbalance
KW - disk anomaly detection
KW - long-term sequence data
KW - multi-instance learning (MIL)
UR - https://www.scopus.com/pages/publications/85103253198
U2 - 10.1109/TIM.2021.3068180
DO - 10.1109/TIM.2021.3068180
M3 - 文章
AN - SCOPUS:85103253198
SN - 0018-9456
VL - 70
JO - IEEE Transactions on Instrumentation and Measurement
JF - IEEE Transactions on Instrumentation and Measurement
M1 - 9383187
ER -