TY - GEN
T1 - Navigation Command Matching for Vision-based Autonomous Driving
AU - Pan, Yuxin
AU - Xue, Jianru
AU - Zhang, Pengfei
AU - Ouyang, Wanli
AU - Fang, Jianwu
AU - Chen, Xingyu
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/5
Y1 - 2020/5
N2 - Learning an optimal policy for autonomous driving task to confront with complex environment is a long- studied challenge. Imitative reinforcement learning is accepted as a promising approach to learn a robust driving policy through expert demonstrations and interactions with environments. However, this model utilizes non-smooth rewards, which have a negative impact on matching between navigation commands and trajectory (state-action pairs), and degrade the generalizability of an agent. Smooth rewards are crucial to discriminate actions generated from sub-optimal policy. In this paper, we propose a navigation command matching (NCM) model to address this issue. There are two key components in NCM, 1) a matching measurer produces smooth navigation rewards that measure matching between navigation commands and trajectory; 2) attention-guided agent performs actions given states where salient regions in RGB images (i.e. roadsides, lane markings and dynamic obstacles) are highlighted to amplify their influence on the final model. We obtain navigation rewards and store transitions to replay buffer after an episode, so NCM is able to discriminate actions generated from suboptimal policy. Experiments on CARLA driving benchmark show our proposed NCM outperforms previous state-of-the- art models on various tasks in terms of the percentage of successfully completed episodes. Moreover, our model improves generalizability of the agent and obtains good performance even in unseen scenarios.
AB - Learning an optimal policy for autonomous driving task to confront with complex environment is a long- studied challenge. Imitative reinforcement learning is accepted as a promising approach to learn a robust driving policy through expert demonstrations and interactions with environments. However, this model utilizes non-smooth rewards, which have a negative impact on matching between navigation commands and trajectory (state-action pairs), and degrade the generalizability of an agent. Smooth rewards are crucial to discriminate actions generated from sub-optimal policy. In this paper, we propose a navigation command matching (NCM) model to address this issue. There are two key components in NCM, 1) a matching measurer produces smooth navigation rewards that measure matching between navigation commands and trajectory; 2) attention-guided agent performs actions given states where salient regions in RGB images (i.e. roadsides, lane markings and dynamic obstacles) are highlighted to amplify their influence on the final model. We obtain navigation rewards and store transitions to replay buffer after an episode, so NCM is able to discriminate actions generated from suboptimal policy. Experiments on CARLA driving benchmark show our proposed NCM outperforms previous state-of-the- art models on various tasks in terms of the percentage of successfully completed episodes. Moreover, our model improves generalizability of the agent and obtains good performance even in unseen scenarios.
UR - https://www.scopus.com/pages/publications/85092725762
U2 - 10.1109/ICRA40945.2020.9196609
DO - 10.1109/ICRA40945.2020.9196609
M3 - 会议稿件
AN - SCOPUS:85092725762
T3 - Proceedings - IEEE International Conference on Robotics and Automation
SP - 4343
EP - 4349
BT - 2020 IEEE International Conference on Robotics and Automation, ICRA 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE International Conference on Robotics and Automation, ICRA 2020
Y2 - 31 May 2020 through 31 August 2020
ER -