TY - JOUR
T1 - Optimal navigation for AGVs
T2 - A soft actor–critic-based reinforcement learning approach with composite auxiliary rewards
AU - Guo, Haisen
AU - Ren, Zhigang
AU - Lai, Jialun
AU - Wu, Zongze
AU - Xie, Shengli
N1 - Publisher Copyright:
© 2023 Elsevier Ltd
PY - 2023/9
Y1 - 2023/9
N2 - In this paper, we address the problem of real-time navigation and obstacle avoidance for automated guided vehicles (AGVs) in dynamic environments, which is a primary research area in collaborative control systems for AGVs. To overcome the computational inefficiency of recalculating optimal paths every time, we propose an improved Soft Actor–Critic (SAC)-based reinforcement learning methodology. This methodology utilizes a novel composite auxiliary reward structure and sum-tree prioritized experience replay (SAC-SP) to achieve real-time optimal feedback control. First, we formulate the navigation task as a Markov Decision Process that considers both static and dynamic obstacles. To accelerate the active learning of AGVs, we propose a novel strategy that uses composite auxiliary rewards. Next, we train the AGVs using the proposed SAC-SP methodology to handle real-time navigation with the composite auxiliary reward structure. The well-trained policy network can generate effective on-board optimal feedback actions given obstacle positions, targets, and AGV states. Simulation experiments demonstrate that our proposed method can steer AGVs to the destination with high robustness to original conditions and various obstacle restrictions, generating optimal feedback actions in the shortest amount of time.
AB - In this paper, we address the problem of real-time navigation and obstacle avoidance for automated guided vehicles (AGVs) in dynamic environments, which is a primary research area in collaborative control systems for AGVs. To overcome the computational inefficiency of recalculating optimal paths every time, we propose an improved Soft Actor–Critic (SAC)-based reinforcement learning methodology. This methodology utilizes a novel composite auxiliary reward structure and sum-tree prioritized experience replay (SAC-SP) to achieve real-time optimal feedback control. First, we formulate the navigation task as a Markov Decision Process that considers both static and dynamic obstacles. To accelerate the active learning of AGVs, we propose a novel strategy that uses composite auxiliary rewards. Next, we train the AGVs using the proposed SAC-SP methodology to handle real-time navigation with the composite auxiliary reward structure. The well-trained policy network can generate effective on-board optimal feedback actions given obstacle positions, targets, and AGV states. Simulation experiments demonstrate that our proposed method can steer AGVs to the destination with high robustness to original conditions and various obstacle restrictions, generating optimal feedback actions in the shortest amount of time.
KW - AGVs
KW - Motion planning
KW - Optimal control
KW - Reinforcement learning
KW - Trajectory planning
UR - https://www.scopus.com/pages/publications/85162262451
U2 - 10.1016/j.engappai.2023.106613
DO - 10.1016/j.engappai.2023.106613
M3 - 文章
AN - SCOPUS:85162262451
SN - 0952-1976
VL - 124
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
M1 - 106613
ER -