Real-Time Optimal Power Flow Method via Safe Deep Reinforcement Learning Based on Primal-Dual and Prior Knowledge Guidance

  • Pengfei Wu
  • , Chen Chen
  • , Dexiang Lai
  • , Jian Zhong
  • , Zhaohong Bie

Research output: Contribution to journalArticlepeer-review

11 Scopus citations

Abstract

High-level penetration of intermittent renewable energy sources (RESs) has introduced significant uncertainties into modern power systems. In order to rapidly and economically respond to the fluctuations of power system operating state, this paper proposes a safe deep reinforcement learning (SDRL) algorithm for the real-time optimal power flow problem. First, this problem is formulated as a Constrained Markov Decision Process model. Second, primal-dual proximal policy optimization (PD-PPO) is proposed to realize adaptively tuned binding effects on policy security constraints while achieving policy enhancement. Utilizing a cost critic network to evaluate policy security, actor gradients are estimated by a Lagrange advantage function derived from economic reward and violation cost critic networks with higher accuracy. Moreover, the performance of the PD-PPO method is further improved with an effective knowledge-driven action masking technique, which explicitly identifies critical action dimensions based on the physical model to encourage the policy in the safety direction with nonconservative exploration. Numerical tests are carried out on the IEEE 9-bus, 30-bus, 118-bus, and ACTIVSg2000 test systems. The results show that the well-trained SDRL agent can significantly improve the computation efficiency while satisfying security constraints and optimality requirements as much as possible.

Original languageEnglish
Pages (from-to)597-611
Number of pages15
JournalIEEE Transactions on Power Systems
Volume40
Issue number1
DOIs
StatePublished - 2025

Keywords

  • Real-time optimal power flow
  • knowledge-driven action masking
  • primal-dual proximal policy optimization
  • safe deep reinforcement learning

Fingerprint

Dive into the research topics of 'Real-Time Optimal Power Flow Method via Safe Deep Reinforcement Learning Based on Primal-Dual and Prior Knowledge Guidance'. Together they form a unique fingerprint.

Cite this