摘要
Calculating the number of distinct values (i.e., NDV) in a column of a big table is costly yet fundamental to a variety of database applications such as data compression and profiling. To reduce the high time and space cost, a number of sketch methods (e.g., HyperLogLog) have been proposed, which estimate the NDV from a constructed compact data summary of distinct values. However, these methods fail or are costly to manage fully-dynamic scenarios where data is often inserted into and deleted from the table. To solve this issue, we propose a novel sketch method, Half-Xor. Our Half-Xor sketch consists of a compact bit matrix and a small counter array, and it needs to set a few bits and update a counter when handling a data insertion/deletion. Compared with the state-of-the-art mergeable method, our experimental results demonstrate that our method Half-Xor is up to 6.6 times more accurate under the same memory usage and reduces the memory usage by up to 16 times to achieve the same estimation accuracy.
| 源语言 | 英语 |
|---|---|
| 页(从-至) | 3111-3125 |
| 页数 | 15 |
| 期刊 | IEEE Transactions on Knowledge and Data Engineering |
| 卷 | 36 |
| 期 | 7 |
| DOI | |
| 出版状态 | 已出版 - 1 7月 2024 |
学术指纹
探究 'Half-Xor: A Fully-Dynamic Sketch for Estimating the Number of Distinct Values in Big Tables' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver