Solver-critic: A reinforcement learning method for discrete-time-constrained-input systems

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

In this article, a solver-critic (SC) architecture is developed for optimal control problems of discrete-time (DT)-constrained-input systems. The proposed design consists of three parts: 1) a critic network; 2) an action solver; and 3) a target network. The critic network first approximates the action-value function using the sum-of-squares (SOS) polynomial. Then, the action solver adopts the SOS programming to obtain control inputs within the constraint set. The target network introduces the soft update mechanism into policy evaluation to stabilize the learning process. By using the proposed architecture, the constrained-input control problem can be solved without adding the nonquadratic functionals into the reward function. In this article, the theoretical analysis of the convergence property is presented. Besides, the effects of both different initial Q-functions and different discount factors are investigated. It is proven that the learned policy converges to the optimal solution of the Hamilton-Jacobi-Bellman equation. Four numerical examples are provided to validate the theoretical analysis and also demonstrate the effectiveness of our approach.

Original languageEnglish
Pages (from-to)5619-5630
Number of pages12
JournalIEEE Transactions on Cybernetics
Volume51
Issue number11
DOIs
StatePublished - 1 Nov 2021
Externally publishedYes

Keywords

  • Input constraints
  • optimal control
  • reinforcement learning (RL)
  • sum-of-squares (SOS) programming

Fingerprint

Dive into the research topics of 'Solver-critic: A reinforcement learning method for discrete-time-constrained-input systems'. Together they form a unique fingerprint.

Cite this