Optimal navigation for AGVs: A soft actor–critic-based reinforcement learning approach with composite auxiliary rewards

  • Haisen Guo
  • , Zhigang Ren
  • , Jialun Lai
  • , Zongze Wu
  • , Shengli Xie

Research output: Contribution to journalArticlepeer-review

26 Scopus citations

Abstract

In this paper, we address the problem of real-time navigation and obstacle avoidance for automated guided vehicles (AGVs) in dynamic environments, which is a primary research area in collaborative control systems for AGVs. To overcome the computational inefficiency of recalculating optimal paths every time, we propose an improved Soft Actor–Critic (SAC)-based reinforcement learning methodology. This methodology utilizes a novel composite auxiliary reward structure and sum-tree prioritized experience replay (SAC-SP) to achieve real-time optimal feedback control. First, we formulate the navigation task as a Markov Decision Process that considers both static and dynamic obstacles. To accelerate the active learning of AGVs, we propose a novel strategy that uses composite auxiliary rewards. Next, we train the AGVs using the proposed SAC-SP methodology to handle real-time navigation with the composite auxiliary reward structure. The well-trained policy network can generate effective on-board optimal feedback actions given obstacle positions, targets, and AGV states. Simulation experiments demonstrate that our proposed method can steer AGVs to the destination with high robustness to original conditions and various obstacle restrictions, generating optimal feedback actions in the shortest amount of time.

Original languageEnglish
Article number106613
JournalEngineering Applications of Artificial Intelligence
Volume124
DOIs
StatePublished - Sep 2023
Externally publishedYes

Keywords

  • AGVs
  • Motion planning
  • Optimal control
  • Reinforcement learning
  • Trajectory planning

Fingerprint

Dive into the research topics of 'Optimal navigation for AGVs: A soft actor–critic-based reinforcement learning approach with composite auxiliary rewards'. Together they form a unique fingerprint.

Cite this