Yang Z
Department of Zoology, Natural History Museum, London, United Kingdom.
Genetics. 1995 Feb;139(2):993-1005. doi: 10.1093/genetics/139.2.993.
We describe a model for the evolution of DNA sequences by nucleotide substitution, whereby nucleotide sites in the sequence evolve over time, whereas the rates of substitution are variable and correlated over sites. The temporal process used to describe substitutions between nucleotides is a continuous-time Markov process, with the four nucleotides as the states. The spatial process used to describe variation and dependence of substitution rates over sites is based on a serially correlated gamma distribution, i.e., an auto-gamma model assuming Markov-dependence of rates at adjacent sites. To achieve computational efficiency, we use several equal-probability categories to approximate the gamma distribution, and the result is an auto-discrete-gamma model for rates over sites. Correlation of rates at sites then is modeled by the Markov chain transition of rates at adjacent sites from one rate category to another, the states of the chain being the rate categories. Two versions of nonparametric models, which place no restrictions on the distributional forms of rates for sites, also are considered, assuming either independence or Markov dependence. The models are applied to data of a segment of mitochondrial genome from nine primate species. Model parameters are estimated by the maximum likelihood method, and models are compared by the likelihood ratio test. Tremendous variation of rates among sites in the sequence is revealed by the analyses, and when rate differences for different codon positions are appropriately accounted for in the models, substitution rates at adjacent sites are found to be strongly (positively) correlated. Robustness of the results to uncertainty of the phylogenetic tree linking the species is examined.
我们描述了一种通过核苷酸替换来研究DNA序列进化的模型,在该模型中,序列中的核苷酸位点随时间演变,而替换率是可变的且在位点间具有相关性。用于描述核苷酸之间替换的时间过程是一个连续时间马尔可夫过程,其中四个核苷酸为状态。用于描述替换率在位点间的变化和依赖性的空间过程基于序列相关的伽马分布,即一个假设相邻位点速率具有马尔可夫依赖性的自伽马模型。为了实现计算效率,我们使用几个等概率类别来近似伽马分布,结果得到一个关于位点速率的自离散伽马模型。然后,位点速率的相关性通过相邻位点速率从一个速率类别到另一个速率类别的马尔可夫链转移来建模,链的状态即为速率类别。我们还考虑了两种非参数模型,它们对位点速率的分布形式不设限制,分别假设独立性或马尔可夫依赖性。这些模型应用于来自九种灵长类动物的线粒体基因组片段的数据。模型参数通过最大似然法估计,并通过似然比检验对模型进行比较。分析揭示了序列中位点间速率的巨大差异,并且当模型中适当考虑不同密码子位置的速率差异时,发现相邻位点的替换率具有很强的(正)相关性。我们检验了结果对于连接这些物种的系统发育树不确定性的稳健性。