跳到主要导航 跳到搜索 跳到主要内容

Half-Xor: A Fully-Dynamic Sketch for Estimating the Number of Distinct Values in Big Tables

  • Xi'an Jiaotong University
  • Huawei Technologies Co., Ltd.

科研成果: 期刊稿件文章同行评审

5 引用 (Scopus)

摘要

Calculating the number of distinct values (i.e., NDV) in a column of a big table is costly yet fundamental to a variety of database applications such as data compression and profiling. To reduce the high time and space cost, a number of sketch methods (e.g., HyperLogLog) have been proposed, which estimate the NDV from a constructed compact data summary of distinct values. However, these methods fail or are costly to manage fully-dynamic scenarios where data is often inserted into and deleted from the table. To solve this issue, we propose a novel sketch method, Half-Xor. Our Half-Xor sketch consists of a compact bit matrix and a small counter array, and it needs to set a few bits and update a counter when handling a data insertion/deletion. Compared with the state-of-the-art mergeable method, our experimental results demonstrate that our method Half-Xor is up to 6.6 times more accurate under the same memory usage and reduces the memory usage by up to 16 times to achieve the same estimation accuracy.

源语言英语
页(从-至)3111-3125
页数15
期刊IEEE Transactions on Knowledge and Data Engineering
36
7
DOI
出版状态已出版 - 1 7月 2024

学术指纹

探究 'Half-Xor: A Fully-Dynamic Sketch for Estimating the Number of Distinct Values in Big Tables' 的科研主题。它们共同构成独一无二的指纹。

引用此