TY - JOUR
T1 - Real-Time Optimal Power Flow Method via Safe Deep Reinforcement Learning Based on Primal-Dual and Prior Knowledge Guidance
AU - Wu, Pengfei
AU - Chen, Chen
AU - Lai, Dexiang
AU - Zhong, Jian
AU - Bie, Zhaohong
N1 - Publisher Copyright:
© 1969-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - High-level penetration of intermittent renewable energy sources (RESs) has introduced significant uncertainties into modern power systems. In order to rapidly and economically respond to the fluctuations of power system operating state, this paper proposes a safe deep reinforcement learning (SDRL) algorithm for the real-time optimal power flow problem. First, this problem is formulated as a Constrained Markov Decision Process model. Second, primal-dual proximal policy optimization (PD-PPO) is proposed to realize adaptively tuned binding effects on policy security constraints while achieving policy enhancement. Utilizing a cost critic network to evaluate policy security, actor gradients are estimated by a Lagrange advantage function derived from economic reward and violation cost critic networks with higher accuracy. Moreover, the performance of the PD-PPO method is further improved with an effective knowledge-driven action masking technique, which explicitly identifies critical action dimensions based on the physical model to encourage the policy in the safety direction with nonconservative exploration. Numerical tests are carried out on the IEEE 9-bus, 30-bus, 118-bus, and ACTIVSg2000 test systems. The results show that the well-trained SDRL agent can significantly improve the computation efficiency while satisfying security constraints and optimality requirements as much as possible.
AB - High-level penetration of intermittent renewable energy sources (RESs) has introduced significant uncertainties into modern power systems. In order to rapidly and economically respond to the fluctuations of power system operating state, this paper proposes a safe deep reinforcement learning (SDRL) algorithm for the real-time optimal power flow problem. First, this problem is formulated as a Constrained Markov Decision Process model. Second, primal-dual proximal policy optimization (PD-PPO) is proposed to realize adaptively tuned binding effects on policy security constraints while achieving policy enhancement. Utilizing a cost critic network to evaluate policy security, actor gradients are estimated by a Lagrange advantage function derived from economic reward and violation cost critic networks with higher accuracy. Moreover, the performance of the PD-PPO method is further improved with an effective knowledge-driven action masking technique, which explicitly identifies critical action dimensions based on the physical model to encourage the policy in the safety direction with nonconservative exploration. Numerical tests are carried out on the IEEE 9-bus, 30-bus, 118-bus, and ACTIVSg2000 test systems. The results show that the well-trained SDRL agent can significantly improve the computation efficiency while satisfying security constraints and optimality requirements as much as possible.
KW - Real-time optimal power flow
KW - knowledge-driven action masking
KW - primal-dual proximal policy optimization
KW - safe deep reinforcement learning
UR - https://www.scopus.com/pages/publications/85192189612
U2 - 10.1109/TPWRS.2024.3395248
DO - 10.1109/TPWRS.2024.3395248
M3 - 文章
AN - SCOPUS:85192189612
SN - 0885-8950
VL - 40
SP - 597
EP - 611
JO - IEEE Transactions on Power Systems
JF - IEEE Transactions on Power Systems
IS - 1
ER -