TY - JOUR
T1 - Reinforcement learning based early classification framework for power transformer differential protection
AU - Wang, Xiaopeng
AU - He, Anyang
AU - Li, Zongbo
AU - Jiao, Zaibin
AU - Lu, Na
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2025/11/1
Y1 - 2025/11/1
N2 - The balance between response speed and diagnosis accuracy forms a critical concern in transformer protection. However, prevailing AI-based transformer protection methods tend to adopt fixed data length to extract electrical quantity information, thus impeding prompt responsiveness to situations where discriminative fault features emerge in the early stages. This study formulates transformer protection as a Markov decision process and proposes an Early Classification Proximal Policy Optimization (ECPPO) framework to utilize reinforcement learning (RL) for data-length adaptive transformer protection with timely action and notable high accuracy. However, the limited generalization of RL algorithms poses a significant issue in the transformer protection scenario. While enhancing the feature extraction capability of a model is essential for improving its generalization ability, ECPPO constructs a two-stage training paradigm to augment the policy model accordingly. In the first stage, a multi-task deep learning framework trains a feature-extraction module with normalization layers employing fault label information and a signal reconstruction task to enrich the feature representation. In the second stage, the pre-trained feature-extraction module is transferred to the agent model with frozen weights, and PPO training is performed. Additionally, to improve the utilization efficiency of samples, a period-circle-shift data augmentation method is proposed, which enhances the generalization capability by cyclically reconstructing data in periodic sequences. To validate the proposed framework, a series of experiments were conducted using simulation data generated by PSCAD/EMTDC software as the training data and practical data generated by experimental transformer system as the testing data. The experimental results demonstrate a significantly enhanced testing accuracy of 99.19 %, coupled with an average response time of 12.10 ms, indicating that the ECPPO algorithm not only achieves superior accuracy but also effectively reduces the average response time. Furthermore, the results highlight its robust generalization capability when transitioning from simulation to experimental systems.
AB - The balance between response speed and diagnosis accuracy forms a critical concern in transformer protection. However, prevailing AI-based transformer protection methods tend to adopt fixed data length to extract electrical quantity information, thus impeding prompt responsiveness to situations where discriminative fault features emerge in the early stages. This study formulates transformer protection as a Markov decision process and proposes an Early Classification Proximal Policy Optimization (ECPPO) framework to utilize reinforcement learning (RL) for data-length adaptive transformer protection with timely action and notable high accuracy. However, the limited generalization of RL algorithms poses a significant issue in the transformer protection scenario. While enhancing the feature extraction capability of a model is essential for improving its generalization ability, ECPPO constructs a two-stage training paradigm to augment the policy model accordingly. In the first stage, a multi-task deep learning framework trains a feature-extraction module with normalization layers employing fault label information and a signal reconstruction task to enrich the feature representation. In the second stage, the pre-trained feature-extraction module is transferred to the agent model with frozen weights, and PPO training is performed. Additionally, to improve the utilization efficiency of samples, a period-circle-shift data augmentation method is proposed, which enhances the generalization capability by cyclically reconstructing data in periodic sequences. To validate the proposed framework, a series of experiments were conducted using simulation data generated by PSCAD/EMTDC software as the training data and practical data generated by experimental transformer system as the testing data. The experimental results demonstrate a significantly enhanced testing accuracy of 99.19 %, coupled with an average response time of 12.10 ms, indicating that the ECPPO algorithm not only achieves superior accuracy but also effectively reduces the average response time. Furthermore, the results highlight its robust generalization capability when transitioning from simulation to experimental systems.
KW - Early classification
KW - Generalization
KW - Reinforcement learning
KW - Transformer protection
UR - https://www.scopus.com/pages/publications/105008526889
U2 - 10.1016/j.eswa.2025.128632
DO - 10.1016/j.eswa.2025.128632
M3 - 文章
AN - SCOPUS:105008526889
SN - 0957-4174
VL - 292
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 128632
ER -