TY - GEN
T1 - Semantic-aware Representation Learning for Homography Estimation
AU - Liu, Yuhan
AU - Huang, Qianxin
AU - Hui, Siqi
AU - Fu, Jingwen
AU - Zhou, Sanping
AU - Wu, Kangyi
AU - Li, Pengna
AU - Wang, Jinjun
N1 - Publisher Copyright:
© 2024 ACM.
PY - 2024/10/28
Y1 - 2024/10/28
N2 - Homography estimation is the task of determining the transformation from an image pair. Our approach focuses on employing detector-free feature matching methods to address this issue. Previous work has underscored the importance of incorporating semantic information, however there still lacks an efficient way to utilize semantic information. Previous methods suffer from treating the semantics as a pre-processing, causing the utilization of semantics overly coarse-grained and lack adaptability when dealing with different tasks. In our work, we seek another way to use the semantic information, that is semantic-aware feature representation learning framework. Based on this, we propose SRMatcher, a new detector-free feature matching method, which encourages the network to learn integrated semantic feature representation. Specifically, to capture precise and rich semantics, we leverage the capabilities of recently popularized vision foundation models (VFMs) trained on extensive datasets. Then, a cross-images Semantic-aware Fusion Block (SFB) is proposed to integrate its fine-grained semantic features into the feature representation space. In this way, by reducing errors stemming from semantic inconsistencies in matching pairs, our proposed SRMatcher is able to deliver more accurate and realistic outcomes. Extensive experiments show that SRMatcher surpasses solid baselines and attains SOTA results on multiple real-world datasets. Compared to the previous SOTA approach GeoFormer, SRMatcher increases the area under the cumulative curve (AUC) by about 11% on HPatches. Additionally, the SRMatcher could serve as a plug-and-play framework for other matching methods like LoFTR, yielding substantial precision improvement.
AB - Homography estimation is the task of determining the transformation from an image pair. Our approach focuses on employing detector-free feature matching methods to address this issue. Previous work has underscored the importance of incorporating semantic information, however there still lacks an efficient way to utilize semantic information. Previous methods suffer from treating the semantics as a pre-processing, causing the utilization of semantics overly coarse-grained and lack adaptability when dealing with different tasks. In our work, we seek another way to use the semantic information, that is semantic-aware feature representation learning framework. Based on this, we propose SRMatcher, a new detector-free feature matching method, which encourages the network to learn integrated semantic feature representation. Specifically, to capture precise and rich semantics, we leverage the capabilities of recently popularized vision foundation models (VFMs) trained on extensive datasets. Then, a cross-images Semantic-aware Fusion Block (SFB) is proposed to integrate its fine-grained semantic features into the feature representation space. In this way, by reducing errors stemming from semantic inconsistencies in matching pairs, our proposed SRMatcher is able to deliver more accurate and realistic outcomes. Extensive experiments show that SRMatcher surpasses solid baselines and attains SOTA results on multiple real-world datasets. Compared to the previous SOTA approach GeoFormer, SRMatcher increases the area under the cumulative curve (AUC) by about 11% on HPatches. Additionally, the SRMatcher could serve as a plug-and-play framework for other matching methods like LoFTR, yielding substantial precision improvement.
KW - feature matching
KW - homography estimation
KW - semantic-aware
UR - https://www.scopus.com/pages/publications/85209822058
U2 - 10.1145/3664647.3681021
DO - 10.1145/3664647.3681021
M3 - 会议稿件
AN - SCOPUS:85209822058
T3 - MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia
SP - 2506
EP - 2514
BT - MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia
PB - Association for Computing Machinery, Inc
T2 - 32nd ACM International Conference on Multimedia, MM 2024
Y2 - 28 October 2024 through 1 November 2024
ER -