Suppr超能文献

评估用于推断核苷酸替换非平稳模式的祖先序列重建方法。

Evaluation of Ancestral Sequence Reconstruction Methods to Infer Nonstationary Patterns of Nucleotide Substitution.

作者信息

Matsumoto Tomotaka, Akashi Hiroshi, Yang Ziheng

机构信息

Division of Evolutionary Genetics, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan.

Division of Evolutionary Genetics, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan Department of Genetics, The Graduate University for Advanced Studies (SOKENDAI), Mishima, Shizuoka 411-8540, Japan

出版信息

Genetics. 2015 Jul;200(3):873-90. doi: 10.1534/genetics.115.177386. Epub 2015 May 6.

Abstract

Inference of gene sequences in ancestral species has been widely used to test hypotheses concerning the process of molecular sequence evolution. However, the approach may produce spurious results, mainly because using the single best reconstruction while ignoring the suboptimal ones creates systematic biases. Here we implement methods to correct for such biases and use computer simulation to evaluate their performance when the substitution process is nonstationary. The methods we evaluated include parsimony and likelihood using the single best reconstruction (SBR), averaging over reconstructions weighted by the posterior probabilities (AWP), and a new method called expected Markov counting (EMC) that produces maximum-likelihood estimates of substitution counts for any branch under a nonstationary Markov model. We simulated base composition evolution on a phylogeny for six species, with different selective pressures on G+C content among lineages, and compared the counts of nucleotide substitutions recorded during simulation with the inference by different methods. We found that large systematic biases resulted from (i) the use of parsimony or likelihood with SBR, (ii) the use of a stationary model when the substitution process is nonstationary, and (iii) the use of the Hasegawa-Kishino-Yano (HKY) model, which is too simple to adequately describe the substitution process. The nonstationary general time reversible (GTR) model, used with AWP or EMC, accurately recovered the substitution counts, even in cases of complex parameter fluctuations. We discuss model complexity and the compromise between bias and variance and suggest that the new methods may be useful for studying complex patterns of nucleotide substitution in large genomic data sets.

摘要

推断祖先物种的基因序列已被广泛用于检验有关分子序列进化过程的假设。然而,这种方法可能会产生虚假结果,主要是因为使用单一最佳重建而忽略次优重建会产生系统偏差。在这里,我们实施了校正此类偏差的方法,并使用计算机模拟来评估它们在替换过程非平稳时的性能。我们评估的方法包括使用单一最佳重建(SBR)的简约法和似然法、对后验概率加权的重建进行平均(AWP),以及一种称为期望马尔可夫计数(EMC)的新方法,该方法可在非平稳马尔可夫模型下产生任何分支替换计数的最大似然估计。我们在一个六个物种的系统发育树上模拟了碱基组成的进化,各谱系对G+C含量有不同的选择压力,并将模拟过程中记录的核苷酸替换计数与不同方法的推断结果进行了比较。我们发现,大的系统偏差源于:(i)使用带有SBR的简约法或似然法;(ii)在替换过程非平稳时使用平稳模型;(iii)使用过于简单而无法充分描述替换过程的Hasegawa-Kishino-Yano(HKY)模型。与AWP或EMC一起使用的非平稳通用时间可逆(GTR)模型,即使在参数波动复杂的情况下,也能准确地恢复替换计数。我们讨论了模型复杂性以及偏差与方差之间的权衡,并表明新方法可能有助于研究大型基因组数据集中复杂的核苷酸替换模式。

相似文献

引用本文的文献

6
Investigation of ancestral alleles in the Bovinae subfamily.牛亚科祖先等位基因的研究。
BMC Genomics. 2021 Feb 8;22(1):108. doi: 10.1186/s12864-021-07412-9.
10

本文引用的文献

1
Nonadaptive Amino Acid Convergence Rates Decrease over Time.非适应性氨基酸趋同率随时间下降。
Mol Biol Evol. 2015 Jun;32(6):1373-81. doi: 10.1093/molbev/msv041. Epub 2015 Mar 3.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验