TY - JOUR
T1 - Policy-Iteration-Based Learning for Nonlinear Player Game Systems with Constrained Inputs
AU - Mu, Chaoxu
AU - Wang, Ke
AU - Sun, Changyin
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2021/10
Y1 - 2021/10
N2 - This article investigates the optimal control problem for nonlinear nonzero-sum differential game in the environment of no initial admissible policies while considering the control constraint. An adaptive learning algorithm is thus developed based on policy iteration technique to approximately obtain the Nash equilibrium using real-time data. A two-player continuous-time system is used to present this approximate mechanism, which is implemented as a critic-actor architecture for every player. The constraint is incorporated into this optimization by introducing the nonquadratic value function, and the associated constrained Hamilton-Jacobi equation is derived. The critic neural network (NN) and actor NN are utilized to learn the value function and the optimal control policy, respectively, in the light of novel weight tuning laws. In order to tackle the stability during the learning phase, two stable operators are designed for two actors. The proposed algorithm is proved to be convergent as a Newton's iteration, and the stability of this closed-loop system is also ensured by Lyapunov analysis. Finally, two simulation examples demonstrate the effectiveness of the proposed learning scheme by considering different constraint scenes.
AB - This article investigates the optimal control problem for nonlinear nonzero-sum differential game in the environment of no initial admissible policies while considering the control constraint. An adaptive learning algorithm is thus developed based on policy iteration technique to approximately obtain the Nash equilibrium using real-time data. A two-player continuous-time system is used to present this approximate mechanism, which is implemented as a critic-actor architecture for every player. The constraint is incorporated into this optimization by introducing the nonquadratic value function, and the associated constrained Hamilton-Jacobi equation is derived. The critic neural network (NN) and actor NN are utilized to learn the value function and the optimal control policy, respectively, in the light of novel weight tuning laws. In order to tackle the stability during the learning phase, two stable operators are designed for two actors. The proposed algorithm is proved to be convergent as a Newton's iteration, and the stability of this closed-loop system is also ensured by Lyapunov analysis. Finally, two simulation examples demonstrate the effectiveness of the proposed learning scheme by considering different constraint scenes.
KW - Control constraint
KW - critic-actor
KW - neural network (NN)
KW - nonzero-sum (NZS) differential game
KW - policy iteration (PI)
UR - https://www.scopus.com/pages/publications/85099880811
U2 - 10.1109/TSMC.2019.2962629
DO - 10.1109/TSMC.2019.2962629
M3 - 文章
AN - SCOPUS:85099880811
SN - 2168-2216
VL - 51
SP - 6488
EP - 6502
JO - IEEE Transactions on Systems, Man, and Cybernetics: Systems
JF - IEEE Transactions on Systems, Man, and Cybernetics: Systems
IS - 10
ER -