跳到主要导航 跳到搜索 跳到主要内容

Model-Based Reinforcement Learning via Proximal Policy Optimization

  • Southeast University, Nanjing

科研成果: 书/报告/会议事项章节会议稿件同行评审

29 引用 (Scopus)

摘要

Proximal policy optimization (PPO) is the state-of the-art most effective model-free reinforcement learning algorithm. Its powerful policy search ability allows an agent to find the optimal policy by trial and error but leads to high computation and low data-efficiency. Model-based algorithms can make the most efficient use of data by learning a forward model from observation, but face the challenge of model error. In this paper, we combine the strengths of both algorithms and introduce a data-efficient model-based approach called PIPPO (probabilistic inference via PPO). It makes online probabilistic dynamic model inference based on Gaussian process regression and executes offline policy improvement using PPO on the inferred model. Empirical evaluation on the pendulum benchmark problem shows that the proposed PIPPO algorithm has comparable performance and less interaction with the environment compared with traditional PPO.

源语言英语
主期刊名Proceedings - 2019 Chinese Automation Congress, CAC 2019
出版商Institute of Electrical and Electronics Engineers Inc.
4736-4740
页数5
ISBN(电子版)9781728140940
DOI
出版状态已出版 - 11月 2019
已对外发布
活动2019 Chinese Automation Congress, CAC 2019 - Hangzhou, 中国
期限: 22 11月 201924 11月 2019

出版系列

姓名Proceedings - 2019 Chinese Automation Congress, CAC 2019

会议

会议2019 Chinese Automation Congress, CAC 2019
国家/地区中国
Hangzhou
时期22/11/1924/11/19

学术指纹

探究 'Model-Based Reinforcement Learning via Proximal Policy Optimization' 的科研主题。它们共同构成独一无二的指纹。

引用此