Abstract
Visual place recognition is a challenging task for autonomous driving and robotics, which is usually considered as an image retrieval problem. A commonly used two-stage strategy involves global retrieval followed by re-ranking using patch-level descriptors. Most deep learning-based methods in an end-to-end manner cannot extract global features with sufficient semantic information from RGB images. In contrast, re-ranking can utilize more explicit structural and semantic information in one-to-one matching process, but it is time-consuming. To bridge the gap between global retrieval and re-ranking and achieve a good trade-off between accuracy and efficiency, we propose StructVPR++, a framework that embeds structural and semantic knowledge into RGB global representations via segmentation-guided distillation. Our key innovation lies in decoupling label-specific features from global descriptors, enabling explicit semantic alignment between image pairs without requiring segmentation during deployment. Furthermore, we introduce a sample-wise weighted distillation strategy that prioritizes reliable training pairs while suppressing noisy ones. Experiments on four benchmarks demonstrate that StructVPR++ surpasses state-of-the-art global methods by 5-23% in Recall@1 and even outperforms many two-stage approaches, achieving real-time efficiency with a single RGB input.
| Original language | English |
|---|---|
| Pages (from-to) | 6338-6351 |
| Number of pages | 14 |
| Journal | IEEE Transactions on Pattern Analysis and Machine Intelligence |
| Volume | 47 |
| Issue number | 8 |
| DOIs | |
| State | Published - 2025 |
Keywords
- Visual place recognition
- image retrieval
- knowledge distillation
- semantic alignment
- semantic segmentation
Fingerprint
Dive into the research topics of 'StructVPR++: Distill Structural and Semantic Knowledge With Weighting Samples for Visual Place Recognition'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver