TY - JOUR
T1 - 基于无监督学习的智能数据中心电力拓扑系统
AU - Jia, Peng
AU - Wang, Pinghui
AU - Chen, Pin An
AU - Chen, Yichao
AU - He, Cheng
AU - Liu, Jiongzhou
AU - Guan, Xiaohong
N1 - Publisher Copyright:
© 2023 Press of Tsinghua University. All rights reserved.
PY - 2023
Y1 - 2023
N2 - [Objective] In mission-critical cloud computing services, large-scale data center (DC) stability is a key metric that must be guaranteed. However, because of uncertain commercial power supplies and complex power equipment operation processes, DC failure events arc inevitable and impactful, affecting related servers and network devices. To mitigate the impact, accurate DC power topology must be obtained to achieve fast and precise failure handling and root-cause localization for mitigating the damage to service quality. Nevertheless, the current process of generating DC power topology is labor intensive, and its correctness cannot be efficiently evaluated and guaranteed. [Mcthods]To solve these issues, instead of using the erroneous power topology provided by the operator, this paper designs an intelligent DC power topology system OPTS). IPTS based on an unsupervised learning framework that automatically generates power topology for the working part of a power system or uses the power system monitoring data to verify manually constructed DC power topology, which may change over time. The intuition behind IPTS is that two physically connected pieces of power equipment should have not only a similar trend but also a close magnitude in specific monitoring data, e.g., current and active power, because their power loads produced by downstream servers arc closed. By defining the structure abstraction of the DC power system according to the domain knowledge of DC power system architectures, the DC power system can be divided into several hierarchical functional blocks. Then, two unsupervised structure learning algorithms, namely, the one-to-one (020) and onc-to-multiplc (G2M) structure learning algorithms, arc separately developed to automatically recover the 020 and G2M connection types between all pieces of power equipment in a divide-and-conqucr manner. Moreover, no methods or metrics can currently be used to verify enterprise DC power topology unless manually checking with high complexity in terms of multiple data sources and numerous connections. To better indicate the consistency of connections within any two pieces of power equipment, this paper further designs an evaluation metric called the consistency ratio (CR). The CR derives from a systematic evaluation process that compares the original enterprise DC power topology information with learning-based enterprise DC power-topology information produced by IPTS automatically and iteratively. [Results] The experimental results of two large-scale DCs show that IPTS automatically generates accurate DC power topology with a 10% improvement on average over existing state-of-the-art methods and effectively reveals most errors (including errors in the local system for operations) in manually constructed DC power topology with 0.990 precision. After performing corrections according to the verification results, CR values between the learned structure and modified DC power topology can be improved to 0. 978 on average, which is 18% ~- 113% higher than that of the original topology. Additionally, for the inconsistent cases that occurred while generating and verifying power topology, this paper gives comprehensive investigations. [Conclusion] IPTS is the first system that uses data analytics for DC power topology generation and verification and has been successfully deployed for 19 enterprise DCs and applied in real large-scale industrial practice.
AB - [Objective] In mission-critical cloud computing services, large-scale data center (DC) stability is a key metric that must be guaranteed. However, because of uncertain commercial power supplies and complex power equipment operation processes, DC failure events arc inevitable and impactful, affecting related servers and network devices. To mitigate the impact, accurate DC power topology must be obtained to achieve fast and precise failure handling and root-cause localization for mitigating the damage to service quality. Nevertheless, the current process of generating DC power topology is labor intensive, and its correctness cannot be efficiently evaluated and guaranteed. [Mcthods]To solve these issues, instead of using the erroneous power topology provided by the operator, this paper designs an intelligent DC power topology system OPTS). IPTS based on an unsupervised learning framework that automatically generates power topology for the working part of a power system or uses the power system monitoring data to verify manually constructed DC power topology, which may change over time. The intuition behind IPTS is that two physically connected pieces of power equipment should have not only a similar trend but also a close magnitude in specific monitoring data, e.g., current and active power, because their power loads produced by downstream servers arc closed. By defining the structure abstraction of the DC power system according to the domain knowledge of DC power system architectures, the DC power system can be divided into several hierarchical functional blocks. Then, two unsupervised structure learning algorithms, namely, the one-to-one (020) and onc-to-multiplc (G2M) structure learning algorithms, arc separately developed to automatically recover the 020 and G2M connection types between all pieces of power equipment in a divide-and-conqucr manner. Moreover, no methods or metrics can currently be used to verify enterprise DC power topology unless manually checking with high complexity in terms of multiple data sources and numerous connections. To better indicate the consistency of connections within any two pieces of power equipment, this paper further designs an evaluation metric called the consistency ratio (CR). The CR derives from a systematic evaluation process that compares the original enterprise DC power topology information with learning-based enterprise DC power-topology information produced by IPTS automatically and iteratively. [Results] The experimental results of two large-scale DCs show that IPTS automatically generates accurate DC power topology with a 10% improvement on average over existing state-of-the-art methods and effectively reveals most errors (including errors in the local system for operations) in manually constructed DC power topology with 0.990 precision. After performing corrections according to the verification results, CR values between the learned structure and modified DC power topology can be improved to 0. 978 on average, which is 18% ~- 113% higher than that of the original topology. Additionally, for the inconsistent cases that occurred while generating and verifying power topology, this paper gives comprehensive investigations. [Conclusion] IPTS is the first system that uses data analytics for DC power topology generation and verification and has been successfully deployed for 19 enterprise DCs and applied in real large-scale industrial practice.
KW - automatic generation and verification
KW - data center
KW - power topology
KW - unsupervised learning
UR - https://www.scopus.com/pages/publications/85153971902
U2 - 10.16511/j.cnki.qhdxxb.2022.21.039
DO - 10.16511/j.cnki.qhdxxb.2022.21.039
M3 - 文章
AN - SCOPUS:85153971902
SN - 1000-0054
VL - 63
SP - 730
EP - 739
JO - Qinghua Daxue Xuebao/Journal of Tsinghua University
JF - Qinghua Daxue Xuebao/Journal of Tsinghua University
IS - 5
ER -