TY - JOUR
T1 - Hierarchical Twin-Delayed Policy Gradient Reinforcement Learning for Intelligent Cooperative Control of Aircraft
AU - Ma, Yu
AU - An, Dou
AU - Lin, Xixiang
AU - Zhao, Jianfu
AU - Zhang, Guanghua
AU - Niu, Hongmin
N1 - Publisher Copyright:
© 2025, Xi'an Jiaotong University. All rights reserved.
PY - 2025
Y1 - 2025
N2 - To address the modeling and coordination challenges in intelligent cooperative control of aircraft caused by large-scale systems, complex environments, and resource constraints, this study proposes an intelligent cooperative control method by establishing a hierarchical multi-agent decision-making architecture with the goal of improving decision-making algorithm efficiency. First, aircraft is modeled as an intelligent agent to establish a cooperative control framework. Second, a partially observable Markov decision process (POMDP) model is employed to handle incomplete observation information. Then, to tackle the issues of dynamic game environments and high learning costs, a hierarchical twin-delayed policy gradient reinforcement learning method based on centralized training with decentralized execution is proposed, which effectively combines model-based and model-free mechanisms to leverage existing game environment evolution models. Finally, under the hierarchical decision-making framework, simulations of typical multi-aircraft game scenarios and thousands of multi-scenario tests are conducted. The results demonstrate that the proposed method successfully resolves multi-aircraft cooperative control problem. Compared to the multi-agent reinforcement learning algorithms MAPPO and QMIX, the training time is reduced by 51.03% and 79.03%, algorithm efficiency (cumulative reward) is improved by 37.51% and 58.73%, and evasion maneuver success rate is increased by 17.63% and 39.79%, respectively.
AB - To address the modeling and coordination challenges in intelligent cooperative control of aircraft caused by large-scale systems, complex environments, and resource constraints, this study proposes an intelligent cooperative control method by establishing a hierarchical multi-agent decision-making architecture with the goal of improving decision-making algorithm efficiency. First, aircraft is modeled as an intelligent agent to establish a cooperative control framework. Second, a partially observable Markov decision process (POMDP) model is employed to handle incomplete observation information. Then, to tackle the issues of dynamic game environments and high learning costs, a hierarchical twin-delayed policy gradient reinforcement learning method based on centralized training with decentralized execution is proposed, which effectively combines model-based and model-free mechanisms to leverage existing game environment evolution models. Finally, under the hierarchical decision-making framework, simulations of typical multi-aircraft game scenarios and thousands of multi-scenario tests are conducted. The results demonstrate that the proposed method successfully resolves multi-aircraft cooperative control problem. Compared to the multi-agent reinforcement learning algorithms MAPPO and QMIX, the training time is reduced by 51.03% and 79.03%, algorithm efficiency (cumulative reward) is improved by 37.51% and 58.73%, and evasion maneuver success rate is increased by 17.63% and 39.79%, respectively.
KW - hierarchical decision
KW - intelligent decision-making
KW - multi-aircraft intelligent cooperative control
KW - reinforcement learning
UR - https://www.scopus.com/pages/publications/105021385923
U2 - 10.7652/xjtuxb202509009
DO - 10.7652/xjtuxb202509009
M3 - 文章
AN - SCOPUS:105021385923
SN - 0253-987X
VL - 59
SP - 88
EP - 98
JO - Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University
JF - Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University
IS - 9
ER -