TY - GEN
T1 - Option-based Multi-agent Exploration
AU - Song, Xuwei
AU - Wan, Lipeng
AU - Liu, Zeyang
AU - Chen, Xingyu
AU - Lan, Xuguang
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Effective exploration is essential to cooperative multi-agent reinforcement learning (MARL). However, existing exploration MARL algorithms remain two challenges: enormous exploration space, and partial observability constraints. To address these challenges, we propose a method called option-based multiagent exploration (OMAE): we introduce the concept of option to reduce the number of decisions, where options are defined as policies with a termination condition. Option-based exploration improves learning efficiency because the option space is much smaller than the original policy space. We use a dual-policy framework to overcome partial observability constraints where the global state is not available in execution. Our framework separates the exploration and the exploitation policies to ensure that the exploitation policy is accessible to the state information without explicitly taking the options as input. We further introduce a likelihood estimation to solve the distribution shift problem between two policies. Experimental results show that the OMAE improves the coordinated ability in comparison with the baseline methods in most of the tasks in the StarCraftII environment(SMAC).
AB - Effective exploration is essential to cooperative multi-agent reinforcement learning (MARL). However, existing exploration MARL algorithms remain two challenges: enormous exploration space, and partial observability constraints. To address these challenges, we propose a method called option-based multiagent exploration (OMAE): we introduce the concept of option to reduce the number of decisions, where options are defined as policies with a termination condition. Option-based exploration improves learning efficiency because the option space is much smaller than the original policy space. We use a dual-policy framework to overcome partial observability constraints where the global state is not available in execution. Our framework separates the exploration and the exploitation policies to ensure that the exploitation policy is accessible to the state information without explicitly taking the options as input. We further introduce a likelihood estimation to solve the distribution shift problem between two policies. Experimental results show that the OMAE improves the coordinated ability in comparison with the baseline methods in most of the tasks in the StarCraftII environment(SMAC).
UR - https://www.scopus.com/pages/publications/85141200736
U2 - 10.1109/CYBER55403.2022.9907622
DO - 10.1109/CYBER55403.2022.9907622
M3 - 会议稿件
AN - SCOPUS:85141200736
T3 - 2022 12th International Conference on CYBER Technology in Automation, Control, and Intelligent Systems, CYBER 2022
SP - 332
EP - 337
BT - 2022 12th International Conference on CYBER Technology in Automation, Control, and Intelligent Systems, CYBER 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 12th International Conference on CYBER Technology in Automation, Control, and Intelligent Systems, CYBER 2022
Y2 - 27 July 2022 through 31 July 2022
ER -