几种DNA序列数据替代模型的适用性测试。

Tests of applicability of several substitution models for DNA sequence data.

作者信息

Rzhetsky A, Nei M

机构信息

Institute of Molecular Evolutionary Genetics, Pennsylvania State University, University Park 16802.

出版信息

Mol Biol Evol. 1995 Jan;12(1):131-51. doi: 10.1093/oxfordjournals.molbev.a040182.

DOI:10.1093/oxfordjournals.molbev.a040182

PMID:7877488

Abstract

Using linear invariants for various models of nucleotide substitution, we developed test statistics for examining the applicability of a specific model to a given dataset in phylogenetic inference. The models examined are those developed by Jukes and Cantor (1969), Kimura (1980), Tajima and Nei (1984), Hasegawa et al. (1985), Tamura (1992), Tamura and Nei (1993), and a new model called the eight-parameter model. The first six models are special cases of the last model. The test statistics developed are independent of evolutionary time and phylogeny, although the variances of the statistics contain phylogenetic information. Therefore, these statistics can be used before a phylogenetic tree is estimated. Our objective is to find the simplest model that is applicable to a given dataset, keeping in mind that a simple model usually gives an estimate of evolutionary distance (number of nucleotide substitutions per site) with a smaller variance than a complicated model when the simple model is correct. We have also developed a statistical test of the homogeneity of nucleotide frequencies of a sample of several sequences that takes into account possible phylogenetic correlations. This test is used to examine the stationarity in time of the base frequencies in the sample. For Hasegawa et al.'s and the eight-parameter models, analytical formulas for estimating evolutionary distances are presented. Application of the above tests to several sets of real data has shown that the assumption of stationarity of base composition is usually acceptable when the sequences studied are closely related but otherwise it is rejected. Similarly, the simple models of nucleotide substitution are almost always rejected when actual genes are distantly related and/or the total number of nucleotides examined is large.

摘要

利用核苷酸替换各种模型的线性不变量，我们开发了检验统计量，用于在系统发育推断中检验特定模型对给定数据集的适用性。所检验的模型包括Jukes和Cantor（1969年）、Kimura（1980年）、Tajima和Nei（1984年）、Hasegawa等人（1985年）、Tamura（1992年）、Tamura和Nei（1993年）所提出的模型，以及一个名为八参数模型的新模型。前六个模型是最后一个模型的特殊情况。所开发的检验统计量与进化时间和系统发育无关，尽管统计量的方差包含系统发育信息。因此，这些统计量可在估计系统发育树之前使用。我们的目标是找到适用于给定数据集的最简单模型，要记住，当简单模型正确时，它通常比复杂模型给出的进化距离（每位点的核苷酸替换数）估计值的方差更小。我们还开发了一种统计检验方法，用于检验几个序列样本的核苷酸频率同质性，该方法考虑了可能的系统发育相关性。此检验用于检查样本中碱基频率的时间平稳性。对于Hasegawa等人的模型和八参数模型，给出了估计进化距离的解析公式。将上述检验应用于几组实际数据表明，当所研究的序列密切相关时，碱基组成平稳性的假设通常是可以接受的，但在其他情况下则被拒绝。同样，当实际基因关系较远和/或所检查的核苷酸总数很大时，核苷酸替换的简单模型几乎总是被拒绝。

相似文献

Tests of applicability of several substitution models for DNA sequence data.几种DNA序列数据替代模型的适用性测试。

Mol Biol Evol. 1995 Jan;12(1):131-51. doi: 10.1093/oxfordjournals.molbev.a040182.

Evolutionary distances between nucleotide sequences based on the distribution of substitution rates among sites as estimated by parsimony.基于简约法估计的位点间替换率分布的核苷酸序列间的进化距离。

Mol Biol Evol. 1997 Mar;14(3):287-98. doi: 10.1093/oxfordjournals.molbev.a025764.

Theoretical foundation to estimate the relative efficiencies of the Jukes-Cantor+gamma model and the Jukes-Cantor model in obtaining the correct phylogenetic tree.用于估计Jukes-Cantor+伽马模型和Jukes-Cantor模型在获得正确系统发育树方面相对效率的理论基础。

Gene. 2006 Dec 30;385:103-10. doi: 10.1016/j.gene.2006.03.027. Epub 2006 Aug 11.

When is it safe to use an oversimplified substitution model in tree-making?在构建树状图时，何时使用过于简化的替代模型是安全的？

Mol Biol Evol. 1996 Nov;13(9):1255-65. doi: 10.1093/oxfordjournals.molbev.a025691.

Classifying and counting linear phylogenetic invariants for the Jukes-Cantor model.对Jukes-Cantor模型的线性系统发育不变量进行分类和计数。

J Comput Biol. 1995 Spring;2(1):39-47. doi: 10.1089/cmb.1995.2.39.

Relative efficiencies of the maximum parsimony and distance-matrix methods in obtaining the correct phylogenetic tree.最大简约法和距离矩阵法在获得正确系统发育树方面的相对效率。

Mol Biol Evol. 1988 May;5(3):298-311. doi: 10.1093/oxfordjournals.molbev.a040497.

Relative efficiencies of the maximum-likelihood, neighbor-joining, and maximum-parsimony methods when substitution rate varies with site.当替换率随位点变化时，最大似然法、邻接法和最大简约法的相对效率。

Mol Biol Evol. 1994 Mar;11(2):261-77. doi: 10.1093/oxfordjournals.molbev.a040108.

A comparison of two methods for constructing evolutionary distances from a weighted contribution of transition and transversion differences.

Mol Biol Evol. 1995 Jul;12(4):713-7. doi: 10.1093/oxfordjournals.molbev.a040248.

Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used.在使用大量序列时，基于最大简约法、最小进化法和最大似然法标准的系统发育推断快速算法的效率。

Mol Biol Evol. 2000 Aug;17(8):1251-8. doi: 10.1093/oxfordjournals.molbev.a026408.

Counting phylogenetic invariants in some simple cases.计算某些简单情形下的系统发生不变量。

J Theor Biol. 1991 Oct 7;152(3):357-76. doi: 10.1016/s0022-5193(05)80200-0.

引用本文的文献

Complex bacterial diversity of Guaymas Basin hydrothermal sediments revealed by synthetic long-read sequencing (LoopSeq).通过合成长读长测序（LoopSeq）揭示的瓜伊马斯盆地热液沉积物的复杂细菌多样性

Front Microbiol. 2025 Jan 7;15:1491488. doi: 10.3389/fmicb.2024.1491488. eCollection 2024.

Effect of Different Types of Sequence Data on Palaeognath Phylogeny.不同类型序列数据对古颌鸟类系统发育的影响。

Genome Biol Evol. 2023 Jun 1;15(6). doi: 10.1093/gbe/evad092.

Evaluation of various distance computation methods for construction of haplotype-based phylogenies from large MLST datasets.基于多位点序列分型大数据集构建单体型系统发育树的各种距离计算方法的评估。

Mol Phylogenet Evol. 2022 Dec;177:107608. doi: 10.1016/j.ympev.2022.107608. Epub 2022 Aug 11.

A new phylogenetic protocol: dealing with model misspecification and confirmation bias in molecular phylogenetics.一种新的系统发育分析方法：应对分子系统发育中的模型误设和确认偏差

NAR Genom Bioinform. 2020 Jun 23;2(2):lqaa041. doi: 10.1093/nargab/lqaa041. eCollection 2020 Jun.

Felsenstein Phylogenetic Likelihood.费雪氏系统发生似然

J Mol Evol. 2021 Apr;89(3):134-145. doi: 10.1007/s00239-020-09982-w. Epub 2021 Jan 13.

The Prevalence and Impact of Model Violations in Phylogenetic Analysis.系统发育分析中模型违反的普遍性及其影响。

Genome Biol Evol. 2019 Dec 1;11(12):3341-3352. doi: 10.1093/gbe/evz193.

Heterogeneous models place the root of the placental mammal phylogeny.异质模型确定胎盘哺乳动物系统发育的根。

Mol Biol Evol. 2013 Sep;30(9):2145-56. doi: 10.1093/molbev/mst117. Epub 2013 Jun 29.

Discovery of a modified tetrapolar sexual cycle in Cryptococcus amylolentus and the evolution of MAT in the Cryptococcus species complex.发现淀粉脉孢菌中的改良四极有性周期和隐球菌种复合体中 MAT 的进化。

PLoS Genet. 2012;8(2):e1002528. doi: 10.1371/journal.pgen.1002528. Epub 2012 Feb 16.

BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments.BMGE（基于信息熵的块映射与聚集）：一种从多序列比对中选择系统发育信息区域的新软件。

BMC Evol Biol. 2010 Jul 13;10:210. doi: 10.1186/1471-2148-10-210.

Measuring fit of sequence data to phylogenetic model: gain of power using marginal tests.衡量序列数据与系统发育模型的拟合度：使用边缘检验增加功效。

J Mol Evol. 2009 Oct;69(4):289-99. doi: 10.1007/s00239-009-9268-8. Epub 2009 Oct 23.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

几种DNA序列数据替代模型的适用性测试。

Tests of applicability of several substitution models for DNA sequence data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献