TY - JOUR
T1 - Learning a World Model With Multitimescale Memory Augmentation
AU - Cai, Wenzhe
AU - Wang, Teng
AU - Wang, Jiawei
AU - Sun, Changyin
N1 - Publisher Copyright:
© 2012 IEEE.
PY - 2023/11/1
Y1 - 2023/11/1
N2 - Model-based reinforcement learning (RL) is regarded as a promising approach to tackle the challenges that hinder model-free RL. The success of model-based RL hinges critically on the quality of the predicted dynamic models. However, for many real-world tasks involving high-dimensional state spaces, current dynamics prediction models show poor performance in long-term prediction. To that end, we propose a novel two-branch neural network architecture with multi-timescale memory augmentation to handle long-term and short-term memory differently. Specifically, we follow previous works to introduce a recurrent neural network architecture to encode history observation sequences into latent space, characterizing the long-term memory of agents. Different from previous works, we view the most recent observations as the short-term memory of agents and employ them to directly reconstruct the next frame to avoid compounding error. This is achieved by introducing a self-supervised optical flow prediction structure to model the action-conditional feature transformation at pixel level. The reconstructed observation is finally augmented by the long-term memory to ensure semantic consistency. Experimental results show that our approach is able to generate visually-realistic long-term predictions in DeepMind maze navigation games, and outperforms the prevalent state-of-the-art methods in prediction accuracy by a large margin. Furthermore, we also evaluate the usefulness of our world model by using the predicted frames to drive an imagination-augmented exploration strategy to improve the model-free RL controller.
AB - Model-based reinforcement learning (RL) is regarded as a promising approach to tackle the challenges that hinder model-free RL. The success of model-based RL hinges critically on the quality of the predicted dynamic models. However, for many real-world tasks involving high-dimensional state spaces, current dynamics prediction models show poor performance in long-term prediction. To that end, we propose a novel two-branch neural network architecture with multi-timescale memory augmentation to handle long-term and short-term memory differently. Specifically, we follow previous works to introduce a recurrent neural network architecture to encode history observation sequences into latent space, characterizing the long-term memory of agents. Different from previous works, we view the most recent observations as the short-term memory of agents and employ them to directly reconstruct the next frame to avoid compounding error. This is achieved by introducing a self-supervised optical flow prediction structure to model the action-conditional feature transformation at pixel level. The reconstructed observation is finally augmented by the long-term memory to ensure semantic consistency. Experimental results show that our approach is able to generate visually-realistic long-term predictions in DeepMind maze navigation games, and outperforms the prevalent state-of-the-art methods in prediction accuracy by a large margin. Furthermore, we also evaluate the usefulness of our world model by using the predicted frames to drive an imagination-augmented exploration strategy to improve the model-free RL controller.
KW - Model-based exploration
KW - multitimescale memory augmentation
KW - reinforcement learning (RL)
KW - world model
UR - https://www.scopus.com/pages/publications/85126336213
U2 - 10.1109/TNNLS.2022.3151412
DO - 10.1109/TNNLS.2022.3151412
M3 - 文章
C2 - 35254991
AN - SCOPUS:85126336213
SN - 2162-237X
VL - 34
SP - 8493
EP - 8502
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 11
ER -