TY - GEN
T1 - H2L
T2 - 9th International Symposium on Artificial Intelligence and Robotics, ISAIR 2024
AU - Tang, Chang
AU - Chen, Shitao
AU - Tian, Zhiqiang
AU - Lan, Xuguang
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - Multi-Agent Path Finding (MAPF) is vital for large-scale Multi-Agent Systems (MAS), where agents must plan collision-free paths to reach their goals. While Reinforcement Learning (RL) methods aim to enhance real-time performance and scalability over search-based approaches, their success on complex maps is limited. This is due to the use of independent RL algorithms, which fail to address the non-stationarity of the environment, and inappropriate reward functions that cause the agent’s policy to worsen with greater distance to the goal. To tackle these issues, we propose a MAPF algorithm based on a new variant of the Value Decomposition Network (VDN), a multi-agent RL algorithm, and introduce a novel reward function. This VDN variant trains the network using agents within a specific agent’s field of view, addressing non-stationarity and training challenges in large-scale MAS, unlike naive VDN, which considers all agents. We introduce a novel reward function using potential-based reward shaping, rendering the agent’s policy independent of the map size. Additionally, we enhance the reward to alleviate congestion by preventing agents from stopping next to each other and by penalizing the following conflicts. Experiments show our planner has a notably higher success rate than other RL-based planners and slightly lower than the latest state-of-the-art search-based planner, LaCAM*, on complex maps. For instance, on a 160 × 160 map with 30% obstacle density and 1024 agents, our planner achieves an 88% success rate, while other RL-based planners achieve virtually 0%.
AB - Multi-Agent Path Finding (MAPF) is vital for large-scale Multi-Agent Systems (MAS), where agents must plan collision-free paths to reach their goals. While Reinforcement Learning (RL) methods aim to enhance real-time performance and scalability over search-based approaches, their success on complex maps is limited. This is due to the use of independent RL algorithms, which fail to address the non-stationarity of the environment, and inappropriate reward functions that cause the agent’s policy to worsen with greater distance to the goal. To tackle these issues, we propose a MAPF algorithm based on a new variant of the Value Decomposition Network (VDN), a multi-agent RL algorithm, and introduce a novel reward function. This VDN variant trains the network using agents within a specific agent’s field of view, addressing non-stationarity and training challenges in large-scale MAS, unlike naive VDN, which considers all agents. We introduce a novel reward function using potential-based reward shaping, rendering the agent’s policy independent of the map size. Additionally, we enhance the reward to alleviate congestion by preventing agents from stopping next to each other and by penalizing the following conflicts. Experiments show our planner has a notably higher success rate than other RL-based planners and slightly lower than the latest state-of-the-art search-based planner, LaCAM*, on complex maps. For instance, on a 160 × 160 map with 30% obstacle density and 1024 agents, our planner achieves an 88% success rate, while other RL-based planners achieve virtually 0%.
KW - Coordination and cooperation
KW - Multi-agent deep reinforcement learning
KW - Multi-agent path finding
KW - Multi-agent path planning
UR - https://www.scopus.com/pages/publications/105000837256
U2 - 10.1007/978-981-96-2911-4_13
DO - 10.1007/978-981-96-2911-4_13
M3 - 会议稿件
AN - SCOPUS:105000837256
SN - 9789819629107
T3 - Communications in Computer and Information Science
SP - 124
EP - 137
BT - Artificial Intelligence and Robotics - 9th International Symposium, ISAIR 2024, Revised Selected Papers
A2 - Lu, Huimin
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 27 September 2024 through 30 September 2024
ER -