Suppr超能文献

密码子替换模型在系统发育重建中的效用实证检验。

An empirical examination of the utility of codon-substitution models in phylogeny reconstruction.

作者信息

Ren Fengrong, Tanaka Hiroshi, Yang Ziheng

机构信息

Advanced Biomedical Information, Center for Information Medicine, Tokyo Medical and Dental University, Japan.

出版信息

Syst Biol. 2005 Oct;54(5):808-18. doi: 10.1080/10635150500354688.

Abstract

Models of codon substitution have been commonly used to compare protein-coding DNA sequences and are particularly effective in detecting signals of natural selection acting on the protein. Their utility in reconstructing molecular phylogenies and in dating species divergences has not been explored. Codon models naturally accommodate synonymous and nonsynonymous substitutions, which occur at very different rates and may be informative for recent and ancient divergences, respectively. Thus codon models may be expected to make an efficient use of phylogenetic information in protein-coding DNA sequences. Here we applied codon models to 106 protein-coding genes from eight yeast species to reconstruct phylogenies using the maximum likelihood method, in comparison with nucleotide- and amino acid-based analyses. The results appeared to confirm that expectation. Nucleotide-based analysis, under simplistic substitution models, were efficient in recovering recent divergences whereas amino acid-based analysis performed better at recovering deep divergences. Codon models appeared to combine the advantages of amino acid and nucleotide data and had good performance at recovering both recent and deep divergences. Estimation of relative species divergence times using amino acid and codon models suggested that translation of gene sequences into proteins led to information loss of from 30% for deep nodes to 66% for recent nodes. Although computational burden makes codon models unfeasible for tree search in large data sets, we suggest that they may be useful for comparing candidate trees. Nucleotide models that accommodate the differences in evolutionary dynamics at the three codon positions also performed well, at much less computational cost. We discuss the relationship between a model's fit to data and its utility in phylogeny reconstruction and caution against use of overly complex substitution models.

摘要

密码子替换模型已被广泛用于比较蛋白质编码DNA序列,在检测作用于蛋白质的自然选择信号方面特别有效。它们在重建分子系统发育和确定物种分歧时间方面的效用尚未得到探索。密码子模型自然地考虑了同义替换和非同义替换,这两种替换的发生速率非常不同,分别可能对近期和古代分歧具有信息价值。因此,密码子模型有望有效利用蛋白质编码DNA序列中的系统发育信息。在这里,我们将密码子模型应用于来自八个酵母物种的106个蛋白质编码基因,使用最大似然法重建系统发育,并与基于核苷酸和氨基酸的分析进行比较。结果似乎证实了这一预期。在简单的替换模型下,基于核苷酸的分析在恢复近期分歧方面很有效,而基于氨基酸的分析在恢复深度分歧方面表现更好。密码子模型似乎结合了氨基酸和核苷酸数据的优点,在恢复近期和深度分歧方面都有良好表现。使用氨基酸和密码子模型估计相对物种分歧时间表明,将基因序列翻译成蛋白质会导致信息损失,从深度节点的30%到近期节点的66%不等。尽管计算负担使得密码子模型在大数据集中进行树搜索不可行,但我们认为它们可能有助于比较候选树。考虑到三个密码子位置进化动态差异的核苷酸模型也表现良好,且计算成本低得多。我们讨论了模型对数据的拟合与其在系统发育重建中的效用之间的关系,并告诫不要使用过于复杂的替换模型。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验