Waddell P J, Steel M A
School of Biological Sciences, Massey University, Palmerston North, New Zealand.
Mol Phylogenet Evol. 1997 Dec;8(3):398-414. doi: 10.1006/mpev.1997.0452.
A series of new results useful to the study of DNA sequences using Markov models of substitution are presented with proofs. General time-reversible distances can be extended to accommodate any fixed distribution of rates across sites by replacing the logarithmic function of a matrix with the inverse of a moment generating function. Estimators are presented assuming a gamma distribution, the inverse Gaussian distribution, or a mixture of either of these with invariant sites. Also considered are the different ways invariant sites may be removed and how these differences may affect estimated distances. Through collaboration, we implemented these distances into PAUP in 1994. The variance of these new distances is approximated via the delta method. It is also shown how to predict the divergence expected for a pair of sequences given a rate matrix and a distribution of rates across sites, allowing iterated ML estimates of distances under any reversible model. A simple test of whether a rate matrix is time reversible is also presented. These new methods are used to estimate the divergence time of humans and chimps from mtDNA sequence data. These analyses support suggestions that the human lineage has an enhanced transition rate relative to other hominoids. These studies also show that transversion distances differ substantially from the overall distances which are dominated by transitions. Transversions alone apparently suggest a very recent divergence time for humans versus chimps and/or a very old (> 16 myr) divergence time for humans versus orangutans. This work illustrates graphically ways to interpret the reliability of distance-based transformations, using the corrected transition to transversion ratio returned for pairs of sequences which are successively more diverged.
本文给出了一系列利用替换的马尔可夫模型来研究DNA序列的新结果,并给出了证明。通过用矩生成函数的逆替换矩阵的对数函数,一般的时间可逆距离可以扩展以适应位点间任何固定的速率分布。给出了在假设伽马分布、逆高斯分布或这些分布之一与不变位点的混合分布情况下的估计器。还考虑了去除不变位点的不同方法以及这些差异如何影响估计的距离。通过合作,我们于1994年在PAUP中实现了这些距离。这些新距离的方差通过德尔塔方法近似。还展示了如何在给定速率矩阵和位点间速率分布的情况下预测一对序列预期的分歧,从而允许在任何可逆模型下进行距离的迭代极大似然估计。还给出了一个关于速率矩阵是否时间可逆的简单检验。这些新方法被用于从线粒体DNA序列数据估计人类和黑猩猩的分歧时间。这些分析支持了这样的观点,即人类谱系相对于其他类人猿具有更高的转换速率。这些研究还表明,颠换距离与主要由转换主导的总体距离有很大差异。仅颠换距离显然表明人类与黑猩猩的分歧时间非常近,和/或人类与猩猩的分歧时间非常古老(>1600万年前)。这项工作通过使用为相继更分化的序列对返回的校正转换与颠换比率,以图形方式说明了解释基于距离的转换可靠性的方法。