评估用于推断核苷酸替换非平稳模式的祖先序列重建方法。

Evaluation of Ancestral Sequence Reconstruction Methods to Infer Nonstationary Patterns of Nucleotide Substitution.

作者信息

Matsumoto Tomotaka, Akashi Hiroshi, Yang Ziheng

机构信息

Division of Evolutionary Genetics, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan.

Division of Evolutionary Genetics, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan Department of Genetics, The Graduate University for Advanced Studies (SOKENDAI), Mishima, Shizuoka 411-8540, Japan

出版信息

Genetics. 2015 Jul;200(3):873-90. doi: 10.1534/genetics.115.177386. Epub 2015 May 6.

DOI:10.1534/genetics.115.177386

PMID:25948563

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4512549/

Abstract

Inference of gene sequences in ancestral species has been widely used to test hypotheses concerning the process of molecular sequence evolution. However, the approach may produce spurious results, mainly because using the single best reconstruction while ignoring the suboptimal ones creates systematic biases. Here we implement methods to correct for such biases and use computer simulation to evaluate their performance when the substitution process is nonstationary. The methods we evaluated include parsimony and likelihood using the single best reconstruction (SBR), averaging over reconstructions weighted by the posterior probabilities (AWP), and a new method called expected Markov counting (EMC) that produces maximum-likelihood estimates of substitution counts for any branch under a nonstationary Markov model. We simulated base composition evolution on a phylogeny for six species, with different selective pressures on G+C content among lineages, and compared the counts of nucleotide substitutions recorded during simulation with the inference by different methods. We found that large systematic biases resulted from (i) the use of parsimony or likelihood with SBR, (ii) the use of a stationary model when the substitution process is nonstationary, and (iii) the use of the Hasegawa-Kishino-Yano (HKY) model, which is too simple to adequately describe the substitution process. The nonstationary general time reversible (GTR) model, used with AWP or EMC, accurately recovered the substitution counts, even in cases of complex parameter fluctuations. We discuss model complexity and the compromise between bias and variance and suggest that the new methods may be useful for studying complex patterns of nucleotide substitution in large genomic data sets.

摘要

推断祖先物种的基因序列已被广泛用于检验有关分子序列进化过程的假设。然而，这种方法可能会产生虚假结果，主要是因为使用单一最佳重建而忽略次优重建会产生系统偏差。在这里，我们实施了校正此类偏差的方法，并使用计算机模拟来评估它们在替换过程非平稳时的性能。我们评估的方法包括使用单一最佳重建（SBR）的简约法和似然法、对后验概率加权的重建进行平均（AWP），以及一种称为期望马尔可夫计数（EMC）的新方法，该方法可在非平稳马尔可夫模型下产生任何分支替换计数的最大似然估计。我们在一个六个物种的系统发育树上模拟了碱基组成的进化，各谱系对G+C含量有不同的选择压力，并将模拟过程中记录的核苷酸替换计数与不同方法的推断结果进行了比较。我们发现，大的系统偏差源于：（i）使用带有SBR的简约法或似然法；（ii）在替换过程非平稳时使用平稳模型；（iii）使用过于简单而无法充分描述替换过程的Hasegawa-Kishino-Yano（HKY）模型。与AWP或EMC一起使用的非平稳通用时间可逆（GTR）模型，即使在参数波动复杂的情况下，也能准确地恢复替换计数。我们讨论了模型复杂性以及偏差与方差之间的权衡，并表明新方法可能有助于研究大型基因组数据集中复杂的核苷酸替换模式。

相似文献

Evaluation of Ancestral Sequence Reconstruction Methods to Infer Nonstationary Patterns of Nucleotide Substitution.评估用于推断核苷酸替换非平稳模式的祖先序列重建方法。

Genetics. 2015 Jul;200(3):873-90. doi: 10.1534/genetics.115.177386. Epub 2015 May 6.

Ancestral inference and the study of codon bias evolution: implications for molecular evolutionary analyses of the Drosophila melanogaster subgroup.祖先推断与密码子偏好性进化研究：对黑腹果蝇亚组分子进化分析的启示

PLoS One. 2007 Oct 24;2(10):e1065. doi: 10.1371/journal.pone.0001065.

Ancestral sequence reconstruction in primate mitochondrial DNA: compositional bias and effect on functional inference.灵长类动物线粒体DNA的祖先序列重建：组成偏向及其对功能推断的影响。

Mol Biol Evol. 2004 Oct;21(10):1871-83. doi: 10.1093/molbev/msh198. Epub 2004 Jun 30.

Distinguishing Among Evolutionary Forces Acting on Genome-Wide Base Composition: Computer Simulation Analysis of Approximate Methods for Inferring Site Frequency Spectra of Derived Mutations.区分作用于全基因组碱基组成的进化力量：推导突变位点频率谱近似方法的计算机模拟分析

G3 (Bethesda). 2018 May 4;8(5):1755-1769. doi: 10.1534/g3.117.300512.

Mitochondrial phylogenomics of early land plants: mitigating the effects of saturation, compositional heterogeneity, and codon-usage bias.早期陆地植物的线粒体系统发育基因组学：减轻饱和度、组成异质性和密码子使用偏好的影响。

Syst Biol. 2014 Nov;63(6):862-78. doi: 10.1093/sysbio/syu049. Epub 2014 Jul 28.

Phylogenetic analysis using parsimony and likelihood methods.使用简约法和似然法进行系统发育分析。

J Mol Evol. 1996 Feb;42(2):294-307. doi: 10.1007/BF02198856.

Reconstruction of ancestral nucleotide sequences and estimation of substitution frequencies in a star phylogeny.星状系统发育树中祖先核苷酸序列的重建及替换频率的估计。

Gene. 2007 Apr 1;390(1-2):75-83. doi: 10.1016/j.gene.2006.11.022. Epub 2006 Dec 14.

Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used.在使用大量序列时，基于最大简约法、最小进化法和最大似然法标准的系统发育推断快速算法的效率。

Mol Biol Evol. 2000 Aug;17(8):1251-8. doi: 10.1093/oxfordjournals.molbev.a026408.

Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis.推断模式与过程：用于系统发育分析的DNA序列进化非齐次模型的最大似然实现。

Mol Biol Evol. 1998 Jul;15(7):871-9. doi: 10.1093/oxfordjournals.molbev.a025991.

A comparative study in ancestral range reconstruction methods: retracing the uncertain histories of insular lineages.祖先分布区重建方法的比较研究：追溯岛屿谱系的不确定历史

Syst Biol. 2008 Oct;57(5):693-707. doi: 10.1080/10635150802426473.

引用本文的文献

Dinucleotide preferences underlie apparent codon preference reversals in the lineage.二核苷酸偏好是该谱系中明显的密码子偏好逆转的基础。

Proc Natl Acad Sci U S A. 2025 May 27;122(21):e2419696122. doi: 10.1073/pnas.2419696122. Epub 2025 May 22.

Extant Sequence Reconstruction: The Accuracy of Ancestral Sequence Reconstructions Evaluated by Extant Sequence Cross-Validation.现存序列重建：通过现存序列交叉验证评估祖先序列重建的准确性。

J Mol Evol. 2024 Apr;92(2):181-206. doi: 10.1007/s00239-024-10162-3. Epub 2024 Mar 19.

DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies.DNA 序列与蛋白质序列一样可用于推断深层系统发育。

Syst Biol. 2023 Nov 1;72(5):1119-1135. doi: 10.1093/sysbio/syad036.

The diverse terminology of reptile eggshell microstructure and its effect on phylogenetic comparative analyses.爬行动物蛋壳微观结构的多样化术语及其对系统发育比较分析的影响。

J Anat. 2022 Sep;241(3):641-666. doi: 10.1111/joa.13723. Epub 2022 Jun 27.

Evidence for a force favoring GC over AT at short intronic sites in Drosophila simulans and Drosophila melanogaster.在果蝇 simulans 和果蝇 melanogaster 的短内含子位点上，支持 GC 相对于 AT 的力量的证据。

G3 (Bethesda). 2021 Sep 6;11(9). doi: 10.1093/g3journal/jkab240.

Investigation of ancestral alleles in the Bovinae subfamily.牛亚科祖先等位基因的研究。

BMC Genomics. 2021 Feb 8;22(1):108. doi: 10.1186/s12864-021-07412-9.

Alignment-Integrated Reconstruction of Ancestral Sequences Improves Accuracy.序列整合对齐提高了祖先序列重建的准确性。

Genome Biol Evol. 2020 Sep 1;12(9):1549-1565. doi: 10.1093/gbe/evaa164.

Impact of C-terminal amino acid composition on protein expression in bacteria.C 末端氨基酸组成对细菌中蛋白质表达的影响。

Mol Syst Biol. 2020 May;16(5):e9208. doi: 10.15252/msb.20199208.

Noncoding regions underpin avian bill shape diversification at macroevolutionary scales.非编码区域在宏观进化尺度上支撑着鸟类喙形的多样化。

Genome Res. 2020 Apr;30(4):553-565. doi: 10.1101/gr.255752.119. Epub 2020 Apr 8.

Allele-specific nonstationarity in evolution of influenza A virus surface proteins.流感 A 病毒表面蛋白进化中的等位基因特异性非平稳性。

Proc Natl Acad Sci U S A. 2019 Oct 15;116(42):21104-21112. doi: 10.1073/pnas.1904246116. Epub 2019 Oct 2.

本文引用的文献

Nonadaptive Amino Acid Convergence Rates Decrease over Time.非适应性氨基酸趋同率随时间下降。

Mol Biol Evol. 2015 Jun;32(6):1373-81. doi: 10.1093/molbev/msv041. Epub 2015 Mar 3.

Mixture models of nucleotide sequence evolution that account for heterogeneity in the substitution process across sites and across lineages.能够解释核苷酸序列进化过程中替换过程在各位置和各谱系间异质性的混合模型。

Syst Biol. 2014 Sep;63(5):726-42. doi: 10.1093/sysbio/syu036. Epub 2014 Jun 12.

From β- to α-proteobacteria: the origin and evolution of rhizobial nodulation genes nodIJ.从β-到α-变形菌：根瘤菌结瘤基因 nodIJ 的起源和进化。

Mol Biol Evol. 2013 Nov;30(11):2494-508. doi: 10.1093/molbev/mst153. Epub 2013 Sep 11.

Bio++: efficient extensible libraries and tools for computational molecular evolution.Bio++：用于计算分子进化的高效可扩展库和工具。

Mol Biol Evol. 2013 Aug;30(8):1745-50. doi: 10.1093/molbev/mst097. Epub 2013 May 21.

Co-evolution of a broadly neutralizing HIV-1 antibody and founder virus.HIV-1 广谱中和抗体与原型病毒的共同进化。

Nature. 2013 Apr 25;496(7446):469-76. doi: 10.1038/nature12053. Epub 2013 Apr 3.

A branch-heterogeneous model of protein evolution for efficient inference of ancestral sequences.一种用于高效推断祖先序列的蛋白质进化分支异质模型。

Syst Biol. 2013 Jul;62(4):523-38. doi: 10.1093/sysbio/syt016. Epub 2013 Mar 7.

Prevalence of multinucleotide replacements in evolution of primates and Drosophila.灵长类动物和果蝇进化中的多核苷酸替换的流行率。

Mol Biol Evol. 2013 Jun;30(6):1315-25. doi: 10.1093/molbev/mst036. Epub 2013 Feb 27.

Codon usage bias and effective population sizes on the X chromosome versus the autosomes in Drosophila melanogaster.果蝇 X 染色体与常染色体的密码子使用偏性和有效种群大小。

Mol Biol Evol. 2013 Apr;30(4):811-23. doi: 10.1093/molbev/mss222. Epub 2012 Nov 29.

Population genomic analysis of base composition evolution in Drosophila melanogaster.黑腹果蝇碱基组成进化的群体基因组分析。

Genome Biol Evol. 2012;4(12):1245-55. doi: 10.1093/gbe/evs097.

Fitting nonstationary general-time-reversible models to obtain edge-lengths and frequencies for the barry-hartigan model.拟合非平稳广义时间可逆模型，以获得 barry-hartigan 模型的边缘长度和频率。

Syst Biol. 2012 Dec 1;61(6):927-40. doi: 10.1093/sysbio/sys046. Epub 2012 Apr 16.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验