Chmielowiec Andrzej, Litwin Paweł
The Faculty of Mechanics and Technology, Rzeszow University of Technology, Kwiatkowskiego 4, 37-450 Stalowa Wola, Poland.
The Faculty of Mechanical Engineering and Aeronautics, Rzeszow University of Technology, Powstańców Warszawy 8, 35-959 Rzeszow, Poland.
Entropy (Basel). 2021 Feb 28;23(3):296. doi: 10.3390/e23030296.
This article deals with compression of binary sequences with a given number of ones, which can also be considered as a list of indexes of a given length. The first part of the article shows that the entropy of random -element binary sequences with exactly elements equal one satisfies the inequalities klog2(0.48·n/k)<H<klog2(2.72·n/k). Based on this result, we propose a simple coding using fixed length words. Its main application is the compression of random binary sequences with a large disproportion between the number of zeros and the number of ones. Importantly, the proposed solution allows for a much faster decompression compared with the Golomb-Rice coding with a relatively small decrease in the efficiency of compression. The proposed algorithm can be particularly useful for database applications for which the speed of decompression is much more important than the degree of index list compression.
本文讨论具有给定数量“1”的二进制序列的压缩问题,这也可被视为给定长度的索引列表。文章的第一部分表明,恰好有(k)个元素等于“1”的随机元素二进制序列的熵满足不等式(k\log_2(0.48\cdot n/k) \lt H \lt k\log_2(2.72\cdot n/k))。基于这一结果,我们提出一种使用固定长度码字的简单编码方法。其主要应用是对“0”的数量与“1”的数量存在很大差异的随机二进制序列进行压缩。重要的是,与哥伦布-莱斯编码相比,所提出的解决方案允许更快地解压,而压缩效率的下降相对较小。所提出的算法对于解压速度比索引列表压缩程度重要得多的数据库应用可能特别有用。