TY - JOUR
T1 - Deep reinforcement learning-based dynamic multi-beam power allocation for GEO-LEO co-existing satellites
AU - Xu, Jing
AU - Fan, Simeng
AU - Zhao, Zhongtian
AU - Li, Fan
AU - Zhang, Yizhai
N1 - Publisher Copyright:
© 2024
PY - 2024/10
Y1 - 2024/10
N2 - This paper first formulates a novel long-term beam power allocation (BPA) problem to tackle the harmful co-linear interference issue in the geostationary earth orbit (GEO) and low earth orbit (LEO) co-existing satellite system. This BPA problem intends to optimize the long-term weighted sum rate of the LEO system while ensuring that GEO user's received interference from the LEO satellite system is lower than a pre-fixed threshold. To solve it in a real-time manner, a deep reinforcement learning (DRL) framework based on the proximal policy optimization (PPO) algorithm is proposed, named as drlBPA. In addition, for the existing most relevant baseline, the fractional optimization (FO)-based BPA scheme, on the one hand, this paper improves it via a greedy strategy to fully exploit time resource. On the other hand, to further reduce the computational complexity stemming from its iterative solving procedure, a deep neural network approximation scheme is also developed. Simulation results demonstrate that (i) The trained DRL model of the proposed drlBPA scheme has good convergence and generality performance. (ii) Compared with the three FO-based benchmarks, the drlBPA scheme not only achieves the highest throughput of the LEO system within a significantly reduced computation time, but also yields the best system stability.
AB - This paper first formulates a novel long-term beam power allocation (BPA) problem to tackle the harmful co-linear interference issue in the geostationary earth orbit (GEO) and low earth orbit (LEO) co-existing satellite system. This BPA problem intends to optimize the long-term weighted sum rate of the LEO system while ensuring that GEO user's received interference from the LEO satellite system is lower than a pre-fixed threshold. To solve it in a real-time manner, a deep reinforcement learning (DRL) framework based on the proximal policy optimization (PPO) algorithm is proposed, named as drlBPA. In addition, for the existing most relevant baseline, the fractional optimization (FO)-based BPA scheme, on the one hand, this paper improves it via a greedy strategy to fully exploit time resource. On the other hand, to further reduce the computational complexity stemming from its iterative solving procedure, a deep neural network approximation scheme is also developed. Simulation results demonstrate that (i) The trained DRL model of the proposed drlBPA scheme has good convergence and generality performance. (ii) Compared with the three FO-based benchmarks, the drlBPA scheme not only achieves the highest throughput of the LEO system within a significantly reduced computation time, but also yields the best system stability.
KW - Deep reinforcement learning
KW - Dynamic beam power allocation
KW - Fractional optimization
KW - GEO-LEO co-existing satellite system
KW - Proximal policy optimization
UR - https://www.scopus.com/pages/publications/85198249063
U2 - 10.1016/j.actaastro.2024.07.004
DO - 10.1016/j.actaastro.2024.07.004
M3 - 文章
AN - SCOPUS:85198249063
SN - 0094-5765
VL - 223
SP - 197
EP - 209
JO - Acta Astronautica
JF - Acta Astronautica
ER -