TY - GEN
T1 - Model-Based Reinforcement Learning via Proximal Policy Optimization
AU - Sun, Yuewen
AU - Yuan, Xin
AU - Liu, Wenzhang
AU - Sun, Changyin
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/11
Y1 - 2019/11
N2 - Proximal policy optimization (PPO) is the state-of the-art most effective model-free reinforcement learning algorithm. Its powerful policy search ability allows an agent to find the optimal policy by trial and error but leads to high computation and low data-efficiency. Model-based algorithms can make the most efficient use of data by learning a forward model from observation, but face the challenge of model error. In this paper, we combine the strengths of both algorithms and introduce a data-efficient model-based approach called PIPPO (probabilistic inference via PPO). It makes online probabilistic dynamic model inference based on Gaussian process regression and executes offline policy improvement using PPO on the inferred model. Empirical evaluation on the pendulum benchmark problem shows that the proposed PIPPO algorithm has comparable performance and less interaction with the environment compared with traditional PPO.
AB - Proximal policy optimization (PPO) is the state-of the-art most effective model-free reinforcement learning algorithm. Its powerful policy search ability allows an agent to find the optimal policy by trial and error but leads to high computation and low data-efficiency. Model-based algorithms can make the most efficient use of data by learning a forward model from observation, but face the challenge of model error. In this paper, we combine the strengths of both algorithms and introduce a data-efficient model-based approach called PIPPO (probabilistic inference via PPO). It makes online probabilistic dynamic model inference based on Gaussian process regression and executes offline policy improvement using PPO on the inferred model. Empirical evaluation on the pendulum benchmark problem shows that the proposed PIPPO algorithm has comparable performance and less interaction with the environment compared with traditional PPO.
KW - Gaussian process regression
KW - data-efficiency
KW - proximal policy optimization
KW - reinforcement learning
UR - https://www.scopus.com/pages/publications/85080068045
U2 - 10.1109/CAC48633.2019.8996875
DO - 10.1109/CAC48633.2019.8996875
M3 - 会议稿件
AN - SCOPUS:85080068045
T3 - Proceedings - 2019 Chinese Automation Congress, CAC 2019
SP - 4736
EP - 4740
BT - Proceedings - 2019 Chinese Automation Congress, CAC 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 Chinese Automation Congress, CAC 2019
Y2 - 22 November 2019 through 24 November 2019
ER -