TY - JOUR
T1 - Well-balanced successive simple-9 for inverted lists compression
AU - Jiang, Kun
AU - Yang, Yuexiang
AU - Zheng, Qinghua
PY - 2017/7
Y1 - 2017/7
N2 - The growth in the amount of information available on the Internet and thousands of user queries per second brings huge challenges to the index update and query processing of search engines. Index compression is partially responsible for the current performance achievements of existing search engines. The selection of the index compression algorithms must weigh three factors, i.e., compression ratio, compression speed and decompression speed. In this paper, we study the well-known Simple-9 compression, in which exist many branch operations, table lookup and data transfer operations when processing each 32-bit machine word. To enhance the compression and decompression performance of Simple-9 algorithm, we propose a successive storage structure and processing metric to compress two successive Simple-9 encoded sequence of integers in a single data processing procedure, thus the name Successive Simple-9 (SSimple-9). In essence, the algorithm shortens the process of branch operations, table lookup and data transfer operations when compressing the integer sequence. More precisely, we initially present the data storage format and mask table of SSimple-9 algorithm. Then, for each mode in the mask table, we design and hard-code the main steps of the compression and decompression processes. Finally, analysis and comparison on the experimental results of the simulation and TREC datasets show the compression and decompression efficiency speedup of the proposed SSimple-9 algorithm.
AB - The growth in the amount of information available on the Internet and thousands of user queries per second brings huge challenges to the index update and query processing of search engines. Index compression is partially responsible for the current performance achievements of existing search engines. The selection of the index compression algorithms must weigh three factors, i.e., compression ratio, compression speed and decompression speed. In this paper, we study the well-known Simple-9 compression, in which exist many branch operations, table lookup and data transfer operations when processing each 32-bit machine word. To enhance the compression and decompression performance of Simple-9 algorithm, we propose a successive storage structure and processing metric to compress two successive Simple-9 encoded sequence of integers in a single data processing procedure, thus the name Successive Simple-9 (SSimple-9). In essence, the algorithm shortens the process of branch operations, table lookup and data transfer operations when compressing the integer sequence. More precisely, we initially present the data storage format and mask table of SSimple-9 algorithm. Then, for each mode in the mask table, we design and hard-code the main steps of the compression and decompression processes. Finally, analysis and comparison on the experimental results of the simulation and TREC datasets show the compression and decompression efficiency speedup of the proposed SSimple-9 algorithm.
KW - Decompression performance
KW - Inverted index
KW - Successive Simple-9
KW - Successive storage
UR - https://www.scopus.com/pages/publications/85021771167
U2 - 10.1587/transinf.2016EDP7466
DO - 10.1587/transinf.2016EDP7466
M3 - 文章
AN - SCOPUS:85021771167
SN - 0916-8532
VL - E100D
SP - 1416
EP - 1424
JO - IEICE Transactions on Information and Systems
JF - IEICE Transactions on Information and Systems
IS - 7
ER -