TY - GEN
T1 - Group UAV Navigation by Qualifying Human-Machine Decisions in Hybrid Reinforcement Learning
AU - Li, Xuyang
AU - Fang, Jianwu
AU - Du, Kai
AU - Xue, Jianru
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - In this paper, we focus on the continuous control of Unmanned Aerial Vehicle (UAV) group in large-scale 3D complex environment based on deep reinforcement learning (DRL) method. The purpose is to make the UAV group safely reach the random target area from a certain starting point, and the flight height and speed are variable during the navigation process. In this paper, a DRL framework combining the human-in-the-loop method is designed. The UAV group is preformed into a mobile whole, and the sensor data of the UAV group is directly mapped to the control signal. The role of human-in-the-loop is to switch the human-machine control right if necessary, so that humans can intervene and correct the dangerous actions of the agent. Based on this framework, an improved Actor-Critic structure is designed, and the policy and value network of the original structure are modified accordingly. We verify the success rate and time efficiency of different numbers of UAV group navigation in the urban environment. The experimental results show that this method can reduce the training convergence time and improve the efficiency and success rate of navigation.
AB - In this paper, we focus on the continuous control of Unmanned Aerial Vehicle (UAV) group in large-scale 3D complex environment based on deep reinforcement learning (DRL) method. The purpose is to make the UAV group safely reach the random target area from a certain starting point, and the flight height and speed are variable during the navigation process. In this paper, a DRL framework combining the human-in-the-loop method is designed. The UAV group is preformed into a mobile whole, and the sensor data of the UAV group is directly mapped to the control signal. The role of human-in-the-loop is to switch the human-machine control right if necessary, so that humans can intervene and correct the dangerous actions of the agent. Based on this framework, an improved Actor-Critic structure is designed, and the policy and value network of the original structure are modified accordingly. We verify the success rate and time efficiency of different numbers of UAV group navigation in the urban environment. The experimental results show that this method can reduce the training convergence time and improve the efficiency and success rate of navigation.
KW - Deep reinforcement learning
KW - UAV group
KW - human-in-the-loop
KW - navigation
UR - https://www.scopus.com/pages/publications/85189336789
U2 - 10.1109/CAC59555.2023.10450270
DO - 10.1109/CAC59555.2023.10450270
M3 - 会议稿件
AN - SCOPUS:85189336789
T3 - Proceedings - 2023 China Automation Congress, CAC 2023
SP - 4238
EP - 4243
BT - Proceedings - 2023 China Automation Congress, CAC 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 China Automation Congress, CAC 2023
Y2 - 17 November 2023 through 19 November 2023
ER -