跳到主要导航 跳到搜索 跳到主要内容

Guided policy search for sequential multitask learning

  • Fangzhou Xiong
  • , Biao Sun
  • , Xu Yang
  • , Hong Qiao
  • , Kaizhu Huang
  • , Amir Hussain
  • , Zhiyong Liu
  • CAS - Institute of Automation
  • University of Chinese Academy of Sciences
  • University of Science and Technology Beijing
  • Chinese Academy of Sciences
  • Xi'an Jiaotong-Liverpool University
  • University of Stirling

科研成果: 期刊稿件文章同行评审

37 引用 (Scopus)

摘要

Policy search in reinforcement learning (RL) is a practical approach to interact directly with environments in parameter spaces, that often deal with dilemmas of local optima and real-time sample collection. A promising algorithm, known as guided policy search (GPS), is capable of handling the challenge of training samples using trajectory-centric methods. It can also provide asymptotic local convergence guarantees. However, in its current form, the GPS algorithm cannot operate in sequential multitask learning scenarios. This is due to its batch-style training requirement, where all training samples are collectively provided at the start of the learning process. The algorithm's adaptation is thus hindered for real-time applications, where training samples or tasks can arrive randomly. In this paper, the GPS approach is reformulated, by adapting a recently proposed, lifelong-learning method, and elastic weight consolidation. Specifically, Fisher information is incorporated to impart knowledge from previously learned tasks. The proposed algorithm, termed sequential multitask learning-GPS, is able to operate in sequential multitask learning settings and ensuring continuous policy learning, without catastrophic forgetting. Pendulum and robotic manipulation experiments demonstrate the new algorithms efficacy to learn control policies for handling sequentially arriving training samples, delivering comparable performance to the traditional, and batch-based GPS algorithm. In conclusion, the proposed algorithm is posited as a new benchmark for the real-time RL and robotics research community.

源语言英语
文章编号8294227
页(从-至)216-226
页数11
期刊IEEE Transactions on Systems, Man, and Cybernetics: Systems
49
1
DOI
出版状态已出版 - 1月 2019
已对外发布

学术指纹

探究 'Guided policy search for sequential multitask learning' 的科研主题。它们共同构成独一无二的指纹。

引用此