跳到主要导航 跳到搜索 跳到主要内容

A Parallel Framework of Adaptive Dynamic Programming Algorithm with Off-Policy Learning

  • Southeast University, Nanjing

科研成果: 期刊稿件文章同行评审

31 引用 (Scopus)

摘要

In this article, a model-free online adaptive dynamic programming (ADP) approach is developed for solving the optimal control problem of nonaffine nonlinear systems. Combining the off-policy learning mechanism with the parallel paradigm, multithread agents are employed to collect the transitions by interacting with the environment that significantly augments the number of sampled data. On the other hand, each thread agent explores the environment with different initial states under its own behavior policy that enhances the exploration capability and alleviates the correlation between the sampled data. After the policy evaluation process, only one step update is required for policy improvement based on the policy gradient method. The stability of the system under iterative control laws is guaranteed. Moreover, the convergence analysis is given to prove that the iterative Q-function is monotonically nonincreasing and finally converges to the solution of the Hamilton-Jacobi-Bellman (HJB) equation. For implementing the algorithm, the actor-critic (AC) structure is utilized with two neural networks (NNs) to approximate the Q-function and the control policy. Finally, the effectiveness of the proposed algorithm is verified by two numerical examples.

源语言英语
文章编号9174778
页(从-至)3578-3587
页数10
期刊IEEE Transactions on Neural Networks and Learning Systems
32
8
DOI
出版状态已出版 - 8月 2021
已对外发布

学术指纹

探究 'A Parallel Framework of Adaptive Dynamic Programming Algorithm with Off-Policy Learning' 的科研主题。它们共同构成独一无二的指纹。

引用此