摘要
Objective Satellite remote sensing images have high application value in urban planning and construction,natural resource management,surface change monitoring,and scientific research. The 3D reconstruction of Earth’s surface by using satellite remote sensing images has become a research hot spot in the fields of computer vision and remote sensing. As neural rendering implicit 3D reconstruction becomes more widely applied in the field of remote sensing,research on neural radiance field(NeRF)in remote sensing scenarios has expanded,and various neural rendering algorithms have emerged. Concepts and models,such as urban planning,communication infrastructure construction,digital maps,and smart cities,require large-scale 3D reconstruction methods to perform better in urban areas. Urban buildings frequently exhibit structural regularities,typically presenting geometric structures,such as cuboids,cubes,and prisms. This important prior knowledge has not been effectively utilized in previous remote sensing NeRF algorithms. Therefore,this study aims to introduce geometric structure constraints that are specific to urban areas into remote sensing NeRF 3D reconstruction algorithms. In addition,the rendering method of NeRF and excessive parameters lead to extremely slow training and rendering time. The number of sampling points is determined by the training data,which are responsible for the training and optimization of the entire target space. Reducing the number of sampling points may degrade rendering quality. Therefore,to achieve fast scene reconstruction,optimizing the network structure is necessary,along with reducing its layers and dimensions to accelerate training and rendering. Method For the 3D reconstruction of complex urban buildings,this study proposes an improved strategy for rendering complex structures more finely and ensuring flatter planes of artificial buildings. During preprocessing,the method adds two adjacent rays to each training ray that do not participate in backpropagation. After obtaining the density information of a scene through a multilayer network,the method inversely solves the depth points of the three rays and determines the surface points of the target. Based on the surface points,a set of surface normals is solved. By using the clustering information of these surface normals,the surface information of the scene and normals perpendicular to the surface is restored and clustered under Manhattan conditions. The three major orthogonal clusters within the Manhattan framework are solved based on the largest cluster,and backpropagation is performed in accordance with constraints,such as orthogonality and normalization,ensuring that the reconstruction target meets Manhattan framework conditions. Training speed is slow due to the extensive computations involved in rendering. To address this issue,the method uses multilayer encoding to reduce the complexity of the neural network. For coarse network sampling points,which must learn the entire scene’s 3D information,multi-resolution learnable positional encoding is used to map them into high-dimensional space. For fine network sampling points,which must learn detailed texture information,hash encoding is used. Hash encoding optimizes network input by interpolating sampling points through multi-resolution grids,finding the corresponding hash encoding data,and integrating multiple resolution hash encodings before feeding them into a smaller network for training. This improvement significantly reduces the training time of the radiance field,achieving fast and high-precision reconstruction of complex urban buildings. This method is implemented on Quadro RTX 8000 graphics processing unit by using the PyTorch Lightning framework. The initial learning rate is set to 5 × 10-4 and decreased after each epoch,and the decreasing factor is set to 0. 9. Result Experiments were conducted on four scenes in the 2019 Data Fusion Contest(DFC2019)dataset,compared with open-source projects NeRF,Shadow-NeRF,EO-NeRF,and SaTensoRF. Evaluation metrics included peak signal-to-noise ratio(PSNR),structural similarity index(SSIM),and digital surface model(DSM). The results indicated that the proposed method significantly reduced model training time and produced superior DSM and new view renderings for complex urban buildings compared with other methods. In particular,training time under the same number of iterations was only one-third of EO-NeRF and significantly less than those of other remote sensing NeRF algorithms. In datasets with a high proportion of artificial buildings,the method achieved the best metrics among several comparative algorithms. In the 068 dataset,it outperformed other remote sensing NeRF algorithms,with PSNR increasing by 0. 67 dB and SSIM by 0. 038 7 compared with the best-performing Shadow-NeRF. Similarly,in the 214 dataset,PSNR increased by 0. 71 dB and SSIM by 0. 052 6 compared with EO-NeRF. In the 260 dataset,PSNR increased by 0. 27 dB and SSIM by 0. 013 5 compared with EO-NeRF. The generated DSM demonstrated good accuracy,with errors within 3. 2 m of the ground truth in the DFC2019 dataset and less than 1. 4 m in the 068 dataset. Conclusion This study proposes an efficient rendering and reconstruction method that combines Manhattan geometric constraints and multi-resolution hash encoding. The Manhattan framework generates surface points during rendering and calculates surface normals for supervision,ensuring that the entire rendering space meets Manhattan framework conditions. The multi-resolution hash encoding module introduces hash tables and learnable positional encoding,reducing the layers and dimensions of the original NeRF and significantly reducing training time. The experiments demonstrate that the algorithm significantly improves modeling accuracy,enhances new view image generation quality,and reduces elevation estimation errors in urban scenes,particularly in artificial building scenarios,while ensuring efficient training performance.
| 投稿的翻译标题 | Manhattan 结构约束神经辐射场在城市遥感图像中的三维重建 |
|---|---|
| 源语言 | 英语 |
| 页(从-至) | 2584-2596 |
| 页数 | 13 |
| 期刊 | Journal of Image and Graphics |
| 卷 | 30 |
| 期 | 7 |
| DOI | |
| 出版状态 | 已出版 - 2025 |
联合国可持续发展目标
此成果有助于实现下列可持续发展目标:
-
可持续发展目标 11 可持续城市和社区
学术指纹
探究 'Manhattan 结构约束神经辐射场在城市遥感图像中的三维重建' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver