TY - GEN
T1 - UML-MVSNet
T2 - 2025 International Joint Conference on Neural Networks, IJCNN 2025
AU - Lu, Yuanliang
AU - Zhang, Qian
AU - Wang, Jianji
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Multi-view stereo methods have achieved remarkable progress in recent years, benefiting from advancements in depth and confidence estimation. Existing multi-view stereo methods estimate depth through regression or classification, but both approaches have notable limitations: regression methods are prone to overfitting, while classification methods struggle to achieve precise depth prediction. Combining the strengths of these approaches is crucial for accurate reconstruction. To address these issues, we propose a novel network, termed UML-MVSNet, to enable accurate feature extraction and depth estimation. Specifically, we introduce a Local Transformer (LT) module that applies attention mechanisms to local features, effectively capturing local detail information and enhancing feature matching accuracy. Additionally, we propose an Uncertainty-guided Multi-task Learning (UML) module that integrates the advantages of both regression and classification branches for robust depth estimation. In each sub-branch, the Uncertainty-Guided Optimization (UGO) module is designed to refine the probability volume guided by uncertainty. To guide the network toward low-uncertainty regions and balance multi-task losses, we introduce the Uncertainty-Aware Loss (UA Loss). Extensive experiments on the DTU and Tanks & Temples datasets demonstrate that our UML-MVSNet achieves competitive results in both qualitative and quantitative performance compared to other state-of-the-art methods.
AB - Multi-view stereo methods have achieved remarkable progress in recent years, benefiting from advancements in depth and confidence estimation. Existing multi-view stereo methods estimate depth through regression or classification, but both approaches have notable limitations: regression methods are prone to overfitting, while classification methods struggle to achieve precise depth prediction. Combining the strengths of these approaches is crucial for accurate reconstruction. To address these issues, we propose a novel network, termed UML-MVSNet, to enable accurate feature extraction and depth estimation. Specifically, we introduce a Local Transformer (LT) module that applies attention mechanisms to local features, effectively capturing local detail information and enhancing feature matching accuracy. Additionally, we propose an Uncertainty-guided Multi-task Learning (UML) module that integrates the advantages of both regression and classification branches for robust depth estimation. In each sub-branch, the Uncertainty-Guided Optimization (UGO) module is designed to refine the probability volume guided by uncertainty. To guide the network toward low-uncertainty regions and balance multi-task losses, we introduce the Uncertainty-Aware Loss (UA Loss). Extensive experiments on the DTU and Tanks & Temples datasets demonstrate that our UML-MVSNet achieves competitive results in both qualitative and quantitative performance compared to other state-of-the-art methods.
KW - Deep learning
KW - Local Transformer
KW - Multi-view stereo
KW - Uncertainty-guided Multi-task Learning
UR - https://www.scopus.com/pages/publications/105023989605
U2 - 10.1109/IJCNN64981.2025.11227326
DO - 10.1109/IJCNN64981.2025.11227326
M3 - 会议稿件
AN - SCOPUS:105023989605
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - International Joint Conference on Neural Networks, IJCNN 2025 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 30 June 2025 through 5 July 2025
ER -