TY - JOUR
T1 - Trajectory planning of mobile robot
T2 - A Lyapunov-based reinforcement learning approach with implicit policy
AU - Lai, Jialun
AU - Wu, Zongze
AU - Ren, Zhigang
AU - Tan, Qi
AU - Xie, Shengli
N1 - Publisher Copyright:
© 2025
PY - 2025/9/5
Y1 - 2025/9/5
N2 - Trajectory planning for mobile robots is a crucial aspect of achieving intelligence in many industrial applications. Learning-based approaches are extremely useful for problems involving complex and difficult-to-define rule designs. However, these approaches frequently require a large amount of training data and lack convergence or interpretability. This work proposes a reinforcement learning paradigm that combines implicit policy with Lyapunov theory to solve the problem of mobile robot trajectory planning. Firstly, we develop a weighted asymmetric Lyapunov reward function and provide an analytical solution with modest dynamics as the implicit policy. Then, we propose event-triggered multi-objective policy optimization, an approach that dynamically adjusts optimization objectives based on event-triggered conditions, which organically fuse it into the modified soft Actor-Critic algorithm, thus shrinking the exploration space and enabling iterative improvement of RL policy. We demonstrate that in disturbed and random scenarios, the proposed fusion policy can achieve specialized policy learning and that its convergence, efficiency, and generalization are verifiable. This clearly demonstrates that our approach can be utilized as a foundational paradigm for the design of reinforcement learning reward and motion control in trajectory planning using an end-to-end approach, which has significant advantages in terms of convergence speed and interpretability.
AB - Trajectory planning for mobile robots is a crucial aspect of achieving intelligence in many industrial applications. Learning-based approaches are extremely useful for problems involving complex and difficult-to-define rule designs. However, these approaches frequently require a large amount of training data and lack convergence or interpretability. This work proposes a reinforcement learning paradigm that combines implicit policy with Lyapunov theory to solve the problem of mobile robot trajectory planning. Firstly, we develop a weighted asymmetric Lyapunov reward function and provide an analytical solution with modest dynamics as the implicit policy. Then, we propose event-triggered multi-objective policy optimization, an approach that dynamically adjusts optimization objectives based on event-triggered conditions, which organically fuse it into the modified soft Actor-Critic algorithm, thus shrinking the exploration space and enabling iterative improvement of RL policy. We demonstrate that in disturbed and random scenarios, the proposed fusion policy can achieve specialized policy learning and that its convergence, efficiency, and generalization are verifiable. This clearly demonstrates that our approach can be utilized as a foundational paradigm for the design of reinforcement learning reward and motion control in trajectory planning using an end-to-end approach, which has significant advantages in terms of convergence speed and interpretability.
KW - Dynamical system movement
KW - Intelligent control
KW - Lyapunov theory
KW - Mobile robots
KW - Reinforcement learning
UR - https://www.scopus.com/pages/publications/105009013750
U2 - 10.1016/j.knosys.2025.113870
DO - 10.1016/j.knosys.2025.113870
M3 - 文章
AN - SCOPUS:105009013750
SN - 0950-7051
VL - 325
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 113870
ER -