Siepel Adam, Haussler David
Center for Biomolecular Science and Engineering, University of California, 1156 High Street, Santa Cruz, CA 95064, USA.
J Comput Biol. 2004;11(2-3):413-28. doi: 10.1089/1066527041410472.
A few models have appeared in recent years that consider not only the way substitutions occur through evolutionary history at each site of a genome, but also the way the process changes from one site to the next. These models combine phylogenetic models of molecular evolution, which apply to individual sites, and hidden Markov models, which allow for changes from site to site. Besides improving the realism of ordinary phylogenetic models, they are potentially very powerful tools for inference and prediction--for example, for gene finding or prediction of secondary structure. In this paper, we review progress on combined phylogenetic and hidden Markov models and present some extensions to previous work. Our main result is a simple and efficient method for accommodating higher-order states in the HMM, which allows for context-dependent models of substitution--that is, models that consider the effects of neighboring bases on the pattern of substitution. We present experimental results indicating that higher-order states, autocorrelated rates, and multiple functional categories all lead to significant improvements in the fit of a combined phylogenetic and hidden Markov model, with the effect of higher-order states being particularly pronounced.
近年来出现了一些模型,这些模型不仅考虑了在基因组的每个位点上替代通过进化历史发生的方式,还考虑了该过程从一个位点到下一个位点变化的方式。这些模型结合了适用于单个位点的分子进化系统发育模型和允许位点间变化的隐马尔可夫模型。除了提高普通系统发育模型的真实性外,它们还是用于推断和预测的潜在强大工具——例如,用于基因发现或二级结构预测。在本文中,我们回顾了系统发育和隐马尔可夫模型相结合方面的进展,并提出了对先前工作的一些扩展。我们的主要结果是一种在隐马尔可夫模型中容纳高阶状态的简单有效方法,这允许使用依赖上下文的替代模型——也就是说,考虑相邻碱基对替代模式影响的模型。我们给出的实验结果表明,高阶状态、自相关速率和多个功能类别都能显著提高系统发育和隐马尔可夫模型相结合的拟合度,其中高阶状态的影响尤为明显。