College of Sciences, Northeastern University, Shenyang 110004, China.
J Theor Biol. 2013 Oct 21;335:276-82. doi: 10.1016/j.jtbi.2013.06.037. Epub 2013 Jul 10.
Alignment free sequence comparison is widely used in sequence analysis, especially in computational biology for large scale similarity comparison. In this paper, we propose a word voting model to compare the biological sequences without alignment. Unlike many comparison methods based on the k word, this model does not use the k word frequency or statistics. Thus there is no limitation on the choice of k. Instead, we used information entropy of gamma distribution to characterize the differences among biological sequences in this model. Finally, we employed the model to do the similarity search and phylogenetic tree construction to further validate the model.
无比对序列比对在序列分析中得到广泛应用,尤其在用于大规模相似性比较的计算生物学中。在本文中,我们提出了一种基于词投票的模型,用于在不进行比对的情况下比较生物序列。与许多基于 k 字的比较方法不同,该模型不使用 k 字频率或统计信息。因此,k 的选择不受限制。相反,我们在该模型中使用伽马分布的信息熵来刻画生物序列之间的差异。最后,我们采用该模型进行相似性搜索和系统发生树构建,以进一步验证该模型。