Jayaswal Vivek, Robinson John, Jermiin Lars
Sydney Bioinformatics, University of Sydney, NSW 2006, Australia.
Syst Biol. 2007 Apr;56(2):155-62. doi: 10.1080/10635150701247921.
The models of nucleotide substitution used by most maximum likelihood-based methods assume that the evolutionary process is stationary, reversible, and homogeneous. We present an extension of the Barry and Hartigan model, which can be used to estimate parameters by maximum likelihood (ML) when the data contain invariant sites and there are violations of the assumptions of stationarity, reversibility, and homogeneity. Unlike most ML methods for estimating invariant sites, we estimate the nucleotide composition of invariant sites separately from that of variable sites. We analyze a bacterial data set where problems due to lack of stationarity and homogeneity have been previously well noted and use the parametric bootstrap to show that the data are consistent with our general Markov model. We also show that estimates of invariant sites obtained using our method are fairly accurate when applied to data simulated under the general Markov model.
大多数基于最大似然法的核苷酸替换模型假定进化过程是平稳的、可逆的且均匀的。我们提出了巴里和哈蒂根模型的一种扩展,当数据包含不变位点且违反平稳性、可逆性和均匀性假设时,该扩展可用于通过最大似然法(ML)估计参数。与大多数用于估计不变位点的ML方法不同,我们分别估计不变位点和可变位点的核苷酸组成。我们分析了一个细菌数据集,该数据集因缺乏平稳性和均匀性而导致的问题此前已得到充分关注,并使用参数自助法表明数据与我们的一般马尔可夫模型一致。我们还表明,当将我们的方法应用于在一般马尔可夫模型下模拟的数据时,获得的不变位点估计相当准确。