Abstract
In this work we study the set size distribution estimation problem, where elements are randomly sampled from a collection of non-overlapping sets and we seek to recover the original set size distribution from the samples. This problem has applications to capacity planning and network theory. Examples of real-world applications include characterizing in-degree distributions in large graphs and uncovering TCP/IP flow size distributions on the Internet. We demonstrate that it is difficult to estimate the original set size distribution. The recoverability of original set size distributions presents a sharp threshold with respect to the fraction of elements that remain in the sets. If this fraction lies below the threshold, typically half of the elements in power-law and heavier-than-exponential-tailed distributions, then the original set size distribution is unrecoverable. We also discuss practical implications of our findings.
| Original language | English |
|---|---|
| Article number | 6517106 |
| Pages (from-to) | 1017-1025 |
| Number of pages | 9 |
| Journal | IEEE Journal on Selected Areas in Communications |
| Volume | 31 |
| Issue number | 6 |
| DOIs | |
| State | Published - 2013 |
Keywords
- Cram'er-Rao lower bound
- Fisher information
- set size distribution estimation