Skip to main navigation Skip to search Skip to main content

Practical characterization of large networks using neighborhood information

  • King Abdullah University of Science and Technology
  • Purdue University
  • Chinese University of Hong Kong
  • University of Massachusetts
  • Tsinghua University

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Characterizing large complex networks such as online social networks through node querying is a challenging task. Network service providers often impose severe constraints on the query rate, hence limiting the sample size to a small fraction of the total network of interest. Various ad hoc subgraph sampling methods have been proposed, but many of them give biased estimates and no theoretical basis on the accuracy. In this work, we focus on developing sampling methods for large networks where querying a node also reveals partial structural information about its neighbors. Our methods are optimized for NoSQL graph databases (if the database can be accessed directly), or utilize Web APIs available on most major large networks for graph sampling. We show that our sampling method has provable convergence guarantees on being an unbiased estimator, and it is more accurate than state-of-the-art methods. We also explore methods to uncover shortest paths between a subset of nodes and detect high degree nodes by sampling only a small fraction of the network of interest. Our results demonstrate that utilizing neighborhood information yields methods that are two orders of magnitude faster than state-of-the-art methods.

Original languageEnglish
Pages (from-to)701-728
Number of pages28
JournalKnowledge and Information Systems
Volume58
Issue number3
DOIs
StatePublished - 5 Mar 2019

Keywords

  • Crawling
  • Graph sampling
  • Online social network
  • Random walk

Fingerprint

Dive into the research topics of 'Practical characterization of large networks using neighborhood information'. Together they form a unique fingerprint.

Cite this