Abstract
In this article, a solver-critic (SC) architecture is developed for optimal control problems of discrete-time (DT)-constrained-input systems. The proposed design consists of three parts: 1) a critic network; 2) an action solver; and 3) a target network. The critic network first approximates the action-value function using the sum-of-squares (SOS) polynomial. Then, the action solver adopts the SOS programming to obtain control inputs within the constraint set. The target network introduces the soft update mechanism into policy evaluation to stabilize the learning process. By using the proposed architecture, the constrained-input control problem can be solved without adding the nonquadratic functionals into the reward function. In this article, the theoretical analysis of the convergence property is presented. Besides, the effects of both different initial Q-functions and different discount factors are investigated. It is proven that the learned policy converges to the optimal solution of the Hamilton-Jacobi-Bellman equation. Four numerical examples are provided to validate the theoretical analysis and also demonstrate the effectiveness of our approach.
| Original language | English |
|---|---|
| Pages (from-to) | 5619-5630 |
| Number of pages | 12 |
| Journal | IEEE Transactions on Cybernetics |
| Volume | 51 |
| Issue number | 11 |
| DOIs | |
| State | Published - 1 Nov 2021 |
| Externally published | Yes |
Keywords
- Input constraints
- optimal control
- reinforcement learning (RL)
- sum-of-squares (SOS) programming