On set size distribution estimation and the characterization of large networks via sampling

Research output: Contribution to journalArticlepeer-review

25 Scopus citations

Abstract

In this work we study the set size distribution estimation problem, where elements are randomly sampled from a collection of non-overlapping sets and we seek to recover the original set size distribution from the samples. This problem has applications to capacity planning and network theory. Examples of real-world applications include characterizing in-degree distributions in large graphs and uncovering TCP/IP flow size distributions on the Internet. We demonstrate that it is difficult to estimate the original set size distribution. The recoverability of original set size distributions presents a sharp threshold with respect to the fraction of elements that remain in the sets. If this fraction lies below the threshold, typically half of the elements in power-law and heavier-than-exponential-tailed distributions, then the original set size distribution is unrecoverable. We also discuss practical implications of our findings.

Original languageEnglish
Article number6517106
Pages (from-to)1017-1025
Number of pages9
JournalIEEE Journal on Selected Areas in Communications
Volume31
Issue number6
DOIs
StatePublished - 2013

Keywords

  • Cram'er-Rao lower bound
  • Fisher information
  • set size distribution estimation

Fingerprint

Dive into the research topics of 'On set size distribution estimation and the characterization of large networks via sampling'. Together they form a unique fingerprint.

Cite this