TY - JOUR
T1 - Point-RMAE
T2 - Reinforcement Masked Autoencoder for 3D Representation Learning
AU - Cheng, Haozhe
AU - Wei, Lintong
AU - Wang, Wenjing
AU - Yan, Wenbiao
AU - Chen, Jinqian
AU - Lu, Jian
AU - Yue, Kun
AU - Zhu, Jihua
N1 - Publisher Copyright:
© 1992-2012 IEEE.
PY - 2026
Y1 - 2026
N2 - The Mainstream 3D masked point modeling representation learning community typically employs predefined, fixed-ratio random or block masking strategies, aiming to obtain optimal representations and achieve high downstream performance. However, these empirical designs overlook the significant geometric information and structural importance differences that are inherent among different 3D points, leading to a suboptimal trade-off between the representation capture capabilities and reconstruction difficulty of such masking strategies. To address this issue, we are the first to present this decision-making problem to a reinforcement learning agent and propose a Reinforcement Masked Autoencoder for 3D representation learning, named Point-RMAE. Guided by geometric features as state factor, this method leverages the Masking Strategy Analyzer and the Dynamic Masking Generator to adaptively decide and apply the masking strategy during pretraining. The Masking Ratio Scheduling module dynamically adjusts the masking ratio based on the optimal strategy. Subsequently, the analyzer is updated by multiscale rewards derived from reconstruction quality level, distribution-aware feedback, and policy exploration. Notably, to enrich the Reward Function with distribution-aware signals and avoid decision collapse issue, we propose a Flow Matching Point Cloud Fast Generator that guides the selected masking decisions. Our method achieves outstanding performance across downstream tasks such as shape classification, medical diagnosis, object detection, action recognition, denoising and multiscale scene segmentation on ten popular 3D and 4D datasets. More importantly, Point-RMAE pioneers the application of reinforcement learning in 3D self-supervised representation learning.
AB - The Mainstream 3D masked point modeling representation learning community typically employs predefined, fixed-ratio random or block masking strategies, aiming to obtain optimal representations and achieve high downstream performance. However, these empirical designs overlook the significant geometric information and structural importance differences that are inherent among different 3D points, leading to a suboptimal trade-off between the representation capture capabilities and reconstruction difficulty of such masking strategies. To address this issue, we are the first to present this decision-making problem to a reinforcement learning agent and propose a Reinforcement Masked Autoencoder for 3D representation learning, named Point-RMAE. Guided by geometric features as state factor, this method leverages the Masking Strategy Analyzer and the Dynamic Masking Generator to adaptively decide and apply the masking strategy during pretraining. The Masking Ratio Scheduling module dynamically adjusts the masking ratio based on the optimal strategy. Subsequently, the analyzer is updated by multiscale rewards derived from reconstruction quality level, distribution-aware feedback, and policy exploration. Notably, to enrich the Reward Function with distribution-aware signals and avoid decision collapse issue, we propose a Flow Matching Point Cloud Fast Generator that guides the selected masking decisions. Our method achieves outstanding performance across downstream tasks such as shape classification, medical diagnosis, object detection, action recognition, denoising and multiscale scene segmentation on ten popular 3D and 4D datasets. More importantly, Point-RMAE pioneers the application of reinforcement learning in 3D self-supervised representation learning.
KW - 3D point cloud
KW - reinforcement learning
KW - representation learning
KW - self-supervised network
UR - https://www.scopus.com/pages/publications/105039893985
U2 - 10.1109/TIP.2026.3694193
DO - 10.1109/TIP.2026.3694193
M3 - 文章
AN - SCOPUS:105039893985
SN - 1057-7149
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
ER -