跳到主要导航 跳到搜索 跳到主要内容

High-performance multi-agent path finding in high-obstacle-density and large-size maps

  • Xi'an Jiaotong University

科研成果: 期刊稿件文章同行评审

1 引用 (Scopus)

摘要

Reinforcement Learning (RL) is an attractive solution to the Multi-Agent Path Finding problem due to its scalability compared to search-based approaches. However, existing RL-based methods often suffer from learning instability and poor performance on complex maps that require significant coordination for collision-free path planning. This is primarily because they overlook non-stationarity and rely on individual reward functions conditioned on the goal distance. To address this limitation, we propose the Proximal Value Decomposition Network (PVDN). PVDN enhances the individual reward through potential-based reward shaping to ensure consistent policy performance regardless of goal distance. It trains the agent and its immediate neighbors by maximizing the team reward, namely, the sum of the individual rewards, to alleviate the exponential growth of action-observation space and memory demands. To eliminate the non-stationarity, PVDN employs the centralized training with decentralized execution paradigm, where the joint Q function is decomposed into individual Q functions. Benefiting from this paradigm, PVDN can also achieve credit assignment and ensure policy consistency between centralized policy and individual policies. Experimental results on a 160×160 random map with 30 % obstacles and 1024 agents show that PVDN outperforms the existing RL-based planners by a large margin and can fully solve the task when goal selection is restricted such that at least 3 out of the 4 cardinally adjacent cells are obstacle-free.

源语言英语
文章编号131943
期刊Neurocomputing
662
DOI
出版状态已出版 - 21 1月 2026

学术指纹

探究 'High-performance multi-agent path finding in high-obstacle-density and large-size maps' 的科研主题。它们共同构成独一无二的指纹。

引用此