TY - GEN
T1 - Research on Complex Robot Manipulation Tasks Based on Hindsight Trust Region Policy Optimization
AU - Yang, Deyu
AU - Zhang, Hanbo
AU - Lan, Xuguang
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/11/6
Y1 - 2020/11/6
N2 - Deep reinforcement learning (DRL) algorithms have make remarkable progress in robot manipulation task in recent years. However, the success of completing the task relies heavily on the special design of reward function which requires engineering experience or domain-specific knowledge. To avoid complex reward shaping and make robot learning more general, it's of great essential to study the sparse-reward environments. In this paper, we present two types of challenging goal-conditioned sparse-reward tasks with 7-DoF robot arm, one is a target reaching task with obstacles, and the other is the dynamic object task where the target object moves at a certain speed. Based on the Hindsight Trust Region Policy Optimization (HTRPO) algorithm proposed by our research group, we studied the control performance on the two types of tasks with continuous high-dimensional state space. The results show that HTRPO can achieve more stable strategic performance, higher success rate and sample efficiency compared with its baseline algorithm TRPO and HPG. However, there still remains challenges in solving the tasks with high moving speed.
AB - Deep reinforcement learning (DRL) algorithms have make remarkable progress in robot manipulation task in recent years. However, the success of completing the task relies heavily on the special design of reward function which requires engineering experience or domain-specific knowledge. To avoid complex reward shaping and make robot learning more general, it's of great essential to study the sparse-reward environments. In this paper, we present two types of challenging goal-conditioned sparse-reward tasks with 7-DoF robot arm, one is a target reaching task with obstacles, and the other is the dynamic object task where the target object moves at a certain speed. Based on the Hindsight Trust Region Policy Optimization (HTRPO) algorithm proposed by our research group, we studied the control performance on the two types of tasks with continuous high-dimensional state space. The results show that HTRPO can achieve more stable strategic performance, higher success rate and sample efficiency compared with its baseline algorithm TRPO and HPG. However, there still remains challenges in solving the tasks with high moving speed.
KW - Reinforcement learning
KW - goal-conditioned
KW - policy optimization
KW - robot manipulation
KW - sparse-reward
UR - https://www.scopus.com/pages/publications/85100922281
U2 - 10.1109/CAC51589.2020.9327251
DO - 10.1109/CAC51589.2020.9327251
M3 - 会议稿件
AN - SCOPUS:85100922281
T3 - Proceedings - 2020 Chinese Automation Congress, CAC 2020
SP - 4541
EP - 4546
BT - Proceedings - 2020 Chinese Automation Congress, CAC 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 Chinese Automation Congress, CAC 2020
Y2 - 6 November 2020 through 8 November 2020
ER -