Skip to main navigation Skip to search Skip to main content

Continuously tracking core items in data streams with probabilistic decays

  • Xi'an Jiaotong University
  • Chinese University of Hong Kong

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

The sheer scale of big data causes the information overload issue and there is an urgent need for tools that can draw valuable insights from massive data. This paper investigates the core items tracking (CIT) problem where the goal is to continuously track representative items, called core items, in a data stream so to best represent/summarize the stream. In order to simultaneously satisfy the recency and continuity requirements, we consider CIT over probabilistic-decaying streams where items in the stream are forgotten gradually in a probabilistic manner. We first introduce an algorithm, called PNDCIT, to find core items in a special kind of probabilistic non-decaying streams. Furthermore, using PNDCIT as a building block, we design two novel algorithms, namely PDCIT and PDCIT+, to maintain core items over probabilistic-decaying streams with constant approximation ratios. Finally, extensive experiments on real data demonstrate that PDCIT+ achieves a speedup of up to one order of magnitude over a batch algorithm while providing solutions with comparable quality.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE 36th International Conference on Data Engineering, ICDE 2020
PublisherIEEE Computer Society
Pages769-780
Number of pages12
ISBN (Electronic)9781728129037
DOIs
StatePublished - Apr 2020
Event36th IEEE International Conference on Data Engineering, ICDE 2020 - Dallas, United States
Duration: 20 Apr 202024 Apr 2020

Publication series

NameProceedings - International Conference on Data Engineering
Volume2020-April
ISSN (Print)1084-4627

Conference

Conference36th IEEE International Conference on Data Engineering, ICDE 2020
Country/TerritoryUnited States
CityDallas
Period20/04/2024/04/20

Fingerprint

Dive into the research topics of 'Continuously tracking core items in data streams with probabilistic decays'. Together they form a unique fingerprint.

Cite this