Suppr超能文献

几种DNA序列数据替代模型的适用性测试。

Tests of applicability of several substitution models for DNA sequence data.

作者信息

Rzhetsky A, Nei M

机构信息

Institute of Molecular Evolutionary Genetics, Pennsylvania State University, University Park 16802.

出版信息

Mol Biol Evol. 1995 Jan;12(1):131-51. doi: 10.1093/oxfordjournals.molbev.a040182.

Abstract

Using linear invariants for various models of nucleotide substitution, we developed test statistics for examining the applicability of a specific model to a given dataset in phylogenetic inference. The models examined are those developed by Jukes and Cantor (1969), Kimura (1980), Tajima and Nei (1984), Hasegawa et al. (1985), Tamura (1992), Tamura and Nei (1993), and a new model called the eight-parameter model. The first six models are special cases of the last model. The test statistics developed are independent of evolutionary time and phylogeny, although the variances of the statistics contain phylogenetic information. Therefore, these statistics can be used before a phylogenetic tree is estimated. Our objective is to find the simplest model that is applicable to a given dataset, keeping in mind that a simple model usually gives an estimate of evolutionary distance (number of nucleotide substitutions per site) with a smaller variance than a complicated model when the simple model is correct. We have also developed a statistical test of the homogeneity of nucleotide frequencies of a sample of several sequences that takes into account possible phylogenetic correlations. This test is used to examine the stationarity in time of the base frequencies in the sample. For Hasegawa et al.'s and the eight-parameter models, analytical formulas for estimating evolutionary distances are presented. Application of the above tests to several sets of real data has shown that the assumption of stationarity of base composition is usually acceptable when the sequences studied are closely related but otherwise it is rejected. Similarly, the simple models of nucleotide substitution are almost always rejected when actual genes are distantly related and/or the total number of nucleotides examined is large.

摘要

利用核苷酸替换各种模型的线性不变量,我们开发了检验统计量,用于在系统发育推断中检验特定模型对给定数据集的适用性。所检验的模型包括Jukes和Cantor(1969年)、Kimura(1980年)、Tajima和Nei(1984年)、Hasegawa等人(1985年)、Tamura(1992年)、Tamura和Nei(1993年)所提出的模型,以及一个名为八参数模型的新模型。前六个模型是最后一个模型的特殊情况。所开发的检验统计量与进化时间和系统发育无关,尽管统计量的方差包含系统发育信息。因此,这些统计量可在估计系统发育树之前使用。我们的目标是找到适用于给定数据集的最简单模型,要记住,当简单模型正确时,它通常比复杂模型给出的进化距离(每位点的核苷酸替换数)估计值的方差更小。我们还开发了一种统计检验方法,用于检验几个序列样本的核苷酸频率同质性,该方法考虑了可能的系统发育相关性。此检验用于检查样本中碱基频率的时间平稳性。对于Hasegawa等人的模型和八参数模型,给出了估计进化距离的解析公式。将上述检验应用于几组实际数据表明,当所研究的序列密切相关时,碱基组成平稳性的假设通常是可以接受的,但在其他情况下则被拒绝。同样,当实际基因关系较远和/或所检查的核苷酸总数很大时,核苷酸替换的简单模型几乎总是被拒绝。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验