School of Mathematical Sciences, Dalian University of Technology, Dalian 116024, PR China.
J Biomol Struct Dyn. 2011 Feb;28(4):545-55. doi: 10.1080/07391102.2011.10508594.
In this paper, we introduce a probabilistic measure for computing the similarity between two biological sequences without alignment. The computation of the similarity measure is based on the Kullback-Leibler divergence of two constructed Markov models. We firstly validate the method on clustering nine chromosomes from three species. Secondly, we give the result of similarity search based on our new method. We lastly apply the measure to the construction of phylogenetic tree of 48 HEV genome sequences. Our results indicate that the weighted relative entropy is an efficient and powerful alignment-free measure for the analysis of sequences in the genomic scale.
在本文中,我们介绍了一种用于计算两个无比对生物序列之间相似性的概率测度。相似性测度的计算基于两个构建的马尔可夫模型的 Kullback-Leibler 散度。我们首先在聚类来自三个物种的九条染色体上验证了该方法。其次,给出了基于我们新方法的相似性搜索结果。最后,我们将该度量应用于 48 个 HEV 基因组序列的系统发生树构建。结果表明,加权相对熵是一种有效的、强大的非比对序列分析方法,可用于基因组规模的分析。