Dai Qi, Yan Zhaofang, Shi Zhuoxing, Liu Xiaoqing, Yao Yuhua, He Pingan
College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China.
J Theor Biol. 2013 Nov 7;336:52-60. doi: 10.1016/j.jtbi.2013.07.008. Epub 2013 Jul 19.
Lempel-Ziv complexity has been widely used for sequence comparison and achieved promising results, but until now components' distribution in exhaustive history has not been studied. This paper investigated the whole distribution of LZ-words and presented a novel statistical method for sequence comparison. With the components' length in mind, we revised Lempel-Ziv complexity and obtained various sets of LZ-words. Instead of calculating the LZ-words' contents, we defined a series of set operations on LZ-word set to compare biological sequences. In order to assess the effectiveness of the proposed method, we performed two sets of experiments and compared it with alignment-based methods.
莱姆佩尔-齐夫复杂度已被广泛用于序列比较并取得了良好的结果,但到目前为止,尚未对详尽历史中组件的分布进行研究。本文研究了LZ词的整体分布,并提出了一种用于序列比较的新颖统计方法。考虑到组件的长度,我们修正了莱姆佩尔-齐夫复杂度并获得了各种LZ词集。我们没有计算LZ词的内容,而是在LZ词集上定义了一系列集合运算来比较生物序列。为了评估所提方法的有效性,我们进行了两组实验,并将其与基于比对的方法进行了比较。