TY - GEN
T1 - Grand
T2 - 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024
AU - Lan, Lin
AU - Wang, Pinghui
AU - Shi, Rui
AU - Liu, Tingqing
AU - Zeng, Juxiang
AU - Sun, Feiyang
AU - Ren, Yang
AU - Tao, Jing
AU - Guan, Xiaohong
N1 - Publisher Copyright:
© 2024 ACM.
PY - 2024/7/11
Y1 - 2024/7/11
N2 - Graph retrieval aims to find the most similar graphs in a graph database given a query graph, which is a fundamental problem with many real-world applications in chemical engineering, code analysis, etc. To date, existing neural graph retrieval methods generally fall into two categories: Embedding Based Paradigm (Ebp) and Matching Based Paradigm (Mbp). The Ebp models learn an individual vectorial representation for each graph and the retrieval process can be accelerated by pre-computing these representations. The Mbp models learn a neural matching function to compare graphs on a pair-by-pair basis, in which the fine-grained pairwise comparison leads to higher retrieval accuracy but severely degrades retrieval efficiency. In this paper, to combine the advantage of Ebp in retrieval efficiency with that of Mbp in retrieval accuracy, we propose a novel Graph RetrievAl framework via KNowledge Distillation, namely GRAND. The key point is to leverage the idea of knowledge distillation to transfer the fine-grained graph comparison knowledge from an Mbp model to an Ebp model, such that the Ebp model can generate better graph representations and thus yield higher retrieval accuracy. At the same time, we can still pre-compute and index the improved graph representations to retain the retrieval speed of Ebp. Towards this end, we propose to perform knowledge distillation from three perspectives: score, node, and subgraph levels. In addition, we propose to perform mutual two-way knowledge transfer between Mbp and Ebp, such that Mbp and Ebp complement and benefit each other. Extensive experiments on three real-world datasets show that GRAND improves the performance of Ebp by a large margin and the improvement is consistent for different combinations of Ebp and Mbp models. For example, GRAND achieves performance gains of mostly more than 10% and up to 16.88% in terms of Recall@K on different datasets.
AB - Graph retrieval aims to find the most similar graphs in a graph database given a query graph, which is a fundamental problem with many real-world applications in chemical engineering, code analysis, etc. To date, existing neural graph retrieval methods generally fall into two categories: Embedding Based Paradigm (Ebp) and Matching Based Paradigm (Mbp). The Ebp models learn an individual vectorial representation for each graph and the retrieval process can be accelerated by pre-computing these representations. The Mbp models learn a neural matching function to compare graphs on a pair-by-pair basis, in which the fine-grained pairwise comparison leads to higher retrieval accuracy but severely degrades retrieval efficiency. In this paper, to combine the advantage of Ebp in retrieval efficiency with that of Mbp in retrieval accuracy, we propose a novel Graph RetrievAl framework via KNowledge Distillation, namely GRAND. The key point is to leverage the idea of knowledge distillation to transfer the fine-grained graph comparison knowledge from an Mbp model to an Ebp model, such that the Ebp model can generate better graph representations and thus yield higher retrieval accuracy. At the same time, we can still pre-compute and index the improved graph representations to retain the retrieval speed of Ebp. Towards this end, we propose to perform knowledge distillation from three perspectives: score, node, and subgraph levels. In addition, we propose to perform mutual two-way knowledge transfer between Mbp and Ebp, such that Mbp and Ebp complement and benefit each other. Extensive experiments on three real-world datasets show that GRAND improves the performance of Ebp by a large margin and the improvement is consistent for different combinations of Ebp and Mbp models. For example, GRAND achieves performance gains of mostly more than 10% and up to 16.88% in terms of Recall@K on different datasets.
KW - gnn
KW - graph retrieval
KW - knowledge distillation
UR - https://www.scopus.com/pages/publications/85200587497
U2 - 10.1145/3626772.3657773
DO - 10.1145/3626772.3657773
M3 - 会议稿件
AN - SCOPUS:85200587497
T3 - SIGIR 2024 - Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 1639
EP - 1648
BT - SIGIR 2024 - Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
PB - Association for Computing Machinery, Inc
Y2 - 14 July 2024 through 18 July 2024
ER -