Husmeier Dirk
Biomathematics and Statistics, Scotland, Edinburgh, UK.
Bioinformatics. 2005 Sep 1;21 Suppl 2:ii166-72. doi: 10.1093/bioinformatics/bti1127.
A recently proposed method for detecting recombination in DNA sequence alignments is based on the combination of hidden Markov models (HMMs) with phylogenetic trees. Although this method was found to detect breakpoints of recombinant regions more accurately than most existing techniques, it inherently fails to distinguish between recombination and rate variation. In the present paper, we propose to marry the phylogenetic tree to a factorial HMM (FHMM). The states of the first hidden chain represent tree topologies, whereas the states of the second independent hidden chain represent different global scaling factors of the branch lengths. Inference is done in terms of a hierarchical Bayesian model, where parameters and hidden states are sampled from the posterior distribution with Gibbs sampling.
We have tested the proposed model on various synthetic and real-world DNA sequence alignments. The simulation results suggest that as opposed to the standard phylogenetic HMM, the phylogenetic FHMM clearly distinguishes between recombination and rate heterogeneity and thereby avoids the prediction of spurious recombinant regions.
The proposed method has been implemented in a MATLAB package that extends Kevin Murphy's HMM toolbox. Software and data used in our study are available from http://www.bioss.sari.ac.uk/~dirk/Supplements
最近提出的一种用于检测DNA序列比对中重组的方法是基于隐马尔可夫模型(HMM)与系统发育树的结合。尽管该方法被发现比大多数现有技术更准确地检测重组区域的断点,但它本质上无法区分重组和速率变化。在本文中,我们建议将系统发育树与因子隐马尔可夫模型(FHMM)相结合。第一个隐藏链的状态代表树的拓扑结构,而第二个独立隐藏链的状态代表分支长度的不同全局缩放因子。推理是根据分层贝叶斯模型进行的,其中参数和隐藏状态通过吉布斯采样从后验分布中采样。
我们在各种合成和真实世界的DNA序列比对上测试了所提出的模型。模拟结果表明,与标准的系统发育HMM不同,系统发育FHMM能够清楚地区分重组和速率异质性,从而避免了对虚假重组区域的预测。
所提出的方法已在一个扩展了凯文·墨菲的HMM工具箱的MATLAB包中实现。我们研究中使用的软件和数据可从http://www.bioss.sari.ac.uk/~dirk/Supplements获取