TY - JOUR
T1 - Knowledge Synergy Learning for Multi-Modal Tracking
AU - He, Yuhang
AU - Ma, Zhiheng
AU - Wei, Xing
AU - Gong, Yihong
N1 - Publisher Copyright:
© 1991-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Benefiting from the rich information provided by different modalities, multi-modal tracking has shown significant improvements compared to single-modal tracking. However, in practical applications, multi-modal tracking still faces two major challenges. Firstly, it is crucial to effectively integrate the complementary information from different modalities in order to improve tracking performance. Secondly, as trackers are often deployed in dynamic environments, it is difficult to ensure complete multi-modal data. Thus, handling modal-missing issues is essential to achieve robust and reliable tracking. To address these challenges, this paper proposes a Knowledge Synergy Network (KSNet) that integrates multi-modal features into a comprehensive representation and incorporates a modal compensation mechanism to handle modal-missing issues. With this framework, a multi-modal tracker (KSTrack) is built and trained using multi-modal data. KSTrack is capable of handling both complete and incomplete multi-modal data during inference. Comprehensive experiments on four large-scale RGB-Thermal (RGB-T) and RGB-Depth (RGB-D) benchmarks show that KSTrack surpasses state-of-the-art multi-modal trackers when using multi-modal data and outperforms single-modal trackers by a large margin when using single-modal data.
AB - Benefiting from the rich information provided by different modalities, multi-modal tracking has shown significant improvements compared to single-modal tracking. However, in practical applications, multi-modal tracking still faces two major challenges. Firstly, it is crucial to effectively integrate the complementary information from different modalities in order to improve tracking performance. Secondly, as trackers are often deployed in dynamic environments, it is difficult to ensure complete multi-modal data. Thus, handling modal-missing issues is essential to achieve robust and reliable tracking. To address these challenges, this paper proposes a Knowledge Synergy Network (KSNet) that integrates multi-modal features into a comprehensive representation and incorporates a modal compensation mechanism to handle modal-missing issues. With this framework, a multi-modal tracker (KSTrack) is built and trained using multi-modal data. KSTrack is capable of handling both complete and incomplete multi-modal data during inference. Comprehensive experiments on four large-scale RGB-Thermal (RGB-T) and RGB-Depth (RGB-D) benchmarks show that KSTrack surpasses state-of-the-art multi-modal trackers when using multi-modal data and outperforms single-modal trackers by a large margin when using single-modal data.
KW - Multi-modal tracking
KW - knowledge synergy learning
KW - modality missing
KW - recurrent modal compensation
UR - https://www.scopus.com/pages/publications/85182928346
U2 - 10.1109/TCSVT.2024.3352573
DO - 10.1109/TCSVT.2024.3352573
M3 - 文章
AN - SCOPUS:85182928346
SN - 1051-8215
VL - 34
SP - 5519
EP - 5532
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
IS - 7
ER -