Safe Adaptive Dynamic Programming for Multiplayer Systems With Static and Moving No-Entry Regions

Research output: Contribution to journalArticlepeer-review

17 Scopus citations

Abstract

In recent years, the use of adaptive dynamic programming algorithm to solve the Nash equilibrium problem of multiplayer differential games has received extensive attention. Although approximate solutions can be obtained, the current algorithms have such a premise that the operation domain of the system is completely safe and the probing noise is required to excite the system during learning. To deal with these challenges, this article considers the optimal avoidance control problem that the system needs to avoid multiple static or dynamic no-entry regions while reaching the target point, and thus proposes a safe adaptive dynamic programming approach. First, the optimal avoidance control problem is formulated and multiple no-entry regions are encoded into each player's cost function using the barrier function. Then, a safe adaptive dynamic programming approach is proposed with several novel features, including actor-critic neural networks composed of state-following kernel function, state extrapolation for achieving virtual excitation, and weight tuning laws for executing adaptive learning. Next, this approach is extended to the case of moving regions and some theoretical results are provided. Finally, the proposed safe learning scheme is demonstrated on three simulation examples, and is also compared with other control methods.

Original languageEnglish
Article number3325780
Pages (from-to)2079-2092
Number of pages14
JournalIEEE Transactions on Artificial Intelligence
Volume5
Issue number5
DOIs
StatePublished - 1 May 2024
Externally publishedYes

Keywords

  • Barrier function
  • multiplayer differential game
  • optimal avoidance control
  • reinforcement learning (RL)
  • safe adaptive dynamic programming (SADP)
  • state extrapolation

Fingerprint

Dive into the research topics of 'Safe Adaptive Dynamic Programming for Multiplayer Systems With Static and Moving No-Entry Regions'. Together they form a unique fingerprint.

Cite this