TY - GEN
T1 - A Revisit to Graph Neighborhood Cardinality Estimation
AU - Wang, Pinghui
AU - Zhang, Yuanming
AU - Cheng, Kuankuan
AU - Zhao, Junzhou
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Graph data are ubiquitous in real-world systems such as social networks and protein-protein interaction networks. In many applications, nodes usually are associated with real-value attributes, e.g., age, income, and wealth. Recently, industry and research communities have attracted attention to mining and learning attribute graphs. In this paper, we study the problem of calculating the general neighborhood cardinality of each node v in the graph, i.e., the sum of non-negative attribute values of the nodes in the k -hop neighborhood of a node v. The naive solution is to run a k -step breadth-first-search (BFS) algorithm starting from each node and storing all visited nodes' attributes. Clearly, the time complexity of this solution is Oleft(|V| dmaxk), where |V| is the number of nodes and dmax is the maximum node degree in the graph. In real-world networks such as Twitter, dmax is over 3×106. Therefore, it is infeasible to compute the neighborhood cardinality of nodes exactly in such massive networks even if we set k=2. To solve this problem, we propose efficient methods to compute the neighborhood cardinality of graphs with non-negative node attributes and binary node attributes, respectively. Extensive experiments on large real-world networks show the efficiency and effectiveness of our methods.
AB - Graph data are ubiquitous in real-world systems such as social networks and protein-protein interaction networks. In many applications, nodes usually are associated with real-value attributes, e.g., age, income, and wealth. Recently, industry and research communities have attracted attention to mining and learning attribute graphs. In this paper, we study the problem of calculating the general neighborhood cardinality of each node v in the graph, i.e., the sum of non-negative attribute values of the nodes in the k -hop neighborhood of a node v. The naive solution is to run a k -step breadth-first-search (BFS) algorithm starting from each node and storing all visited nodes' attributes. Clearly, the time complexity of this solution is Oleft(|V| dmaxk), where |V| is the number of nodes and dmax is the maximum node degree in the graph. In real-world networks such as Twitter, dmax is over 3×106. Therefore, it is infeasible to compute the neighborhood cardinality of nodes exactly in such massive networks even if we set k=2. To solve this problem, we propose efficient methods to compute the neighborhood cardinality of graphs with non-negative node attributes and binary node attributes, respectively. Extensive experiments on large real-world networks show the efficiency and effectiveness of our methods.
KW - Attributed graphs
KW - Cardinality estimation
KW - Sketch algorithms
UR - https://www.scopus.com/pages/publications/85200509039
U2 - 10.1109/ICDE60146.2024.00242
DO - 10.1109/ICDE60146.2024.00242
M3 - 会议稿件
AN - SCOPUS:85200509039
T3 - Proceedings - International Conference on Data Engineering
SP - 3125
EP - 3137
BT - Proceedings - 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024
PB - IEEE Computer Society
T2 - 40th IEEE International Conference on Data Engineering, ICDE 2024
Y2 - 13 May 2024 through 17 May 2024
ER -