Skip to main navigation Skip to search Skip to main content

FuPaD: Scalable Pose Estimation by Fusing Patch-Wise VGGT with Dense Bundle Adjustment

  • Xi'an Jiaotong University
  • Shaanxi Key Laboratory of Intelligent Robots

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Pose estimation, a cornerstone of 3D computer vision, is crucial for applications such as autonomous driving and augmented reality. Global feed-forward methods, such as VGGT, demonstrate potential in direct scene reconstruction and pose inference. However, they are often constrained by prohibitive memory requirements when processing long sequences typical in large-scale environments. Furthermore, the accuracy of their single-pass predictions is often limited by the absence of explicit local geometric modeling or iterative refinement. To address these limitations, we introduce FuPaD, a novel hierarchical approach for scalable pose estimation. FuPaD integrates global pose priors derived from a tailored VGGT with the local refinement offered by dense bundle adjustment (DBA). First, a tracking-informed patch sampling strategy is introduced to select salient image patches from keyframes. These patches are subsequently processed by the tailored VGGT to yield globally consistent keyframe pose priors, meanwhile significantly reducing the memory footprint compared to frame-wise processing. These global keyframe poses are then integrated with dense local pose estimates from DBA within a pose graph optimization framework. Finally, a global DBA module further refines all poses. Such hierarchical fusion ensures the global consistency while benefiting from the fine-grained local refinement provided by DBA. Evaluation on benchmarks indicates that FuPaD achieves competitive pose accuracy, particularly in large-scale scenarios, while exhibiting computational and memory efficiency.

Original languageEnglish
Title of host publicationIntelligent Robotics and Applications - 18th International Conference, ICIRA 2025, Proceedings
EditorsTakayuki Matsuno, Honghai Liu, Lianqing Liu, Zhouping Yin, Xiangyang Zhu, Weihong Ren, Zhiyong Wang, Yixuan Sheng
PublisherSpringer Science and Business Media Deutschland GmbH
Pages508-520
Number of pages13
ISBN (Print)9789819521005
DOIs
StatePublished - 2026
Event18th International Conference on Intelligent Robotics and Applications, ICIRA 2025 - Okayama, Japan
Duration: 6 Aug 20259 Aug 2025

Publication series

NameLecture Notes in Computer Science
Volume16076 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th International Conference on Intelligent Robotics and Applications, ICIRA 2025
Country/TerritoryJapan
CityOkayama
Period6/08/259/08/25

Keywords

  • 3D Reconstruction
  • Deep Learning Methods
  • Deep Learning for Visual Perception
  • Visual SLAM

Fingerprint

Dive into the research topics of 'FuPaD: Scalable Pose Estimation by Fusing Patch-Wise VGGT with Dense Bundle Adjustment'. Together they form a unique fingerprint.

Cite this