Beerenwinkel Niko, Rahnenführer Jörg, Däumer Martin, Hoffmann Daniel, Kaiser Rolf, Selbig Joachim, Lengauer Thomas
Max-Planck-Institut für Informatik, Stuhlsatzenhausweg 85, D-66123 Saarbrücken, Germany.
J Comput Biol. 2005 Jul-Aug;12(6):584-98. doi: 10.1089/cmb.2005.12.584.
We introduce a mixture model of trees to describe evolutionary processes that are characterized by the ordered accumulation of permanent genetic changes. The basic building block of the model is a directed weighted tree that generates a probability distribution on the set of all patterns of genetic events. We present an EM-like algorithm for learning a mixture model of K trees and show how to determine K with a maximum likelihood approach. As a case study, we consider the accumulation of mutations in the HIV-1 reverse transcriptase that are associated with drug resistance. The fitted model is statistically validated as a density estimator, and the stability of the model topology is analyzed. We obtain a generative probabilistic model for the development of drug resistance in HIV that agrees with biological knowledge. Further applications and extensions of the model are discussed.
我们引入一种树的混合模型来描述以永久性基因变化的有序积累为特征的进化过程。该模型的基本构建块是一棵有向加权树,它在所有基因事件模式的集合上生成一个概率分布。我们提出一种类似期望最大化(EM)的算法来学习K棵树的混合模型,并展示如何用最大似然方法确定K。作为一个案例研究,我们考虑与耐药性相关的HIV-1逆转录酶中的突变积累。拟合模型作为密度估计器进行了统计验证,并分析了模型拓扑结构的稳定性。我们获得了一个与生物学知识相符的HIV耐药性发展的生成概率模型。还讨论了该模型的进一步应用和扩展。