TY - JOUR
T1 - Hierarchical multi-agent reinforcement learning for cooperative tasks with sparse rewards in continuous domain
AU - Cao, Jingyu
AU - Dong, Lu
AU - Yuan, Xin
AU - Wang, Yuanda
AU - Sun, Changyin
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.
PY - 2024/1
Y1 - 2024/1
N2 - The sparse reward problem has long been one of the most challenging topics in the application of reinforcement learning (RL), especially in complex multi-agent systems. In this paper, a hierarchical multi-agent RL architecture is developed to address the sparse reward problem of cooperative tasks in continuous domain. The proposed architecture is divided into two levels: the higher-level meta-agent implements state transitions on a larger time scale to alleviate the sparse reward problem, which receives global observation as spatial information and formulates sub-goals for the lower-level agents; the lower-level agent receives local observation and sub-goal and completes the cooperative tasks. In addition, to improve the stability of the higher-level policy, a channel is built to transmit the lower-level policy to the meta-agent as temporal information, and then a two-stream structure is adopted in the actor-critic networks of the meta-agent to process spatial and temporal information. Simulation experiments on different tasks demonstrate that the proposed algorithm effectively alleviates the sparse reward problem, so as to learn desired cooperative policies.
AB - The sparse reward problem has long been one of the most challenging topics in the application of reinforcement learning (RL), especially in complex multi-agent systems. In this paper, a hierarchical multi-agent RL architecture is developed to address the sparse reward problem of cooperative tasks in continuous domain. The proposed architecture is divided into two levels: the higher-level meta-agent implements state transitions on a larger time scale to alleviate the sparse reward problem, which receives global observation as spatial information and formulates sub-goals for the lower-level agents; the lower-level agent receives local observation and sub-goal and completes the cooperative tasks. In addition, to improve the stability of the higher-level policy, a channel is built to transmit the lower-level policy to the meta-agent as temporal information, and then a two-stream structure is adopted in the actor-critic networks of the meta-agent to process spatial and temporal information. Simulation experiments on different tasks demonstrate that the proposed algorithm effectively alleviates the sparse reward problem, so as to learn desired cooperative policies.
KW - Cooperative multi-agent systems
KW - Hierarchical framework
KW - Reinforcement learning
KW - Sparse reward
KW - Two-stream structure
UR - https://www.scopus.com/pages/publications/85173702566
U2 - 10.1007/s00521-023-08882-6
DO - 10.1007/s00521-023-08882-6
M3 - 文章
AN - SCOPUS:85173702566
SN - 0941-0643
VL - 36
SP - 273
EP - 287
JO - Neural Computing and Applications
JF - Neural Computing and Applications
IS - 1
ER -