Suppr超能文献

同义替换显著改善了从高度分化的蛋白质进行的进化推断。

Synonymous substitutions substantially improve evolutionary inference from highly diverged proteins.

作者信息

Seo Tae-Kun, Kishino Hirohisa

机构信息

Professional Programme for Agricultural Bioinformatics, Graduate School of Agricultural and Life Sciences, University of Tokyo, Tokyo, Japan.

出版信息

Syst Biol. 2008 Jun;57(3):367-77. doi: 10.1080/10635150802158670.

Abstract

Codon-and amino acid-substitution models are widely used for the evolutionary analysis of protein-coding DNA sequences. Using codon models, the amounts of both nonsynonymous and synonymous DNA substitutions can be estimated. The ratio of these amounts represents the strength of selective pressure. Using amino acid models, the amount of nonsynonymous substitutions is estimated, but that of synonymous substitutions is ignored. Although amino acid models lose any information regarding synonymous substitutions, they explicitly incorporate the information for amino acid replacement, which is empirically derived from databases. It is often presumed that when the protein-coding sequences are highly divergent, synonymous substitutions might be saturated and the evolutionary analysis may be hampered by synonymous noise. However, there exists no quantitative procedure to verify whether synonymous substitutions can be ignored; therefore, amino acid models have been arbitrarily selected. In this study, we investigate the issue of a statistical comparison between codon-and amino acid-substitution models. For this purpose, we propose a new procedure to transform a 20-dimensional amino acid model to a 61-dimensional codon model. This transformation reveals that amino acid models belong to a subset of the codon models and enables us to test whether synonymous substitutions can be ignored by using the likelihood ratio. Our theoretical results and analyses of real data indicate that synonymous substitutions are very informative and substantially improve evolutionary inference, even when the sequences are highly divergent. Therefore, we note that amino acid models should be adopted only after carefully investigating and discarding the possibility that synonymous substitutions can reveal important evolutionary information.

摘要

密码子和氨基酸替换模型广泛用于蛋白质编码DNA序列的进化分析。使用密码子模型,可以估计非同义替换和同义替换的数量。这些数量的比率代表选择压力的强度。使用氨基酸模型,可以估计非同义替换的数量,但忽略同义替换的数量。尽管氨基酸模型丢失了关于同义替换的任何信息,但它们明确纳入了从数据库中经验性得出的氨基酸替换信息。人们通常认为,当蛋白质编码序列高度分化时,同义替换可能会饱和,进化分析可能会受到同义噪声的阻碍。然而,目前还没有定量程序来验证同义替换是否可以忽略;因此,氨基酸模型是被随意选择的。在本研究中,我们调查了密码子和氨基酸替换模型之间的统计比较问题。为此,我们提出了一种将20维氨基酸模型转换为61维密码子模型的新程序。这种转换表明氨基酸模型属于密码子模型的一个子集,并使我们能够使用似然比来测试同义替换是否可以忽略。我们的理论结果和对实际数据的分析表明,即使序列高度分化,同义替换也非常有信息价值,并能显著改善进化推断。因此,我们指出,只有在仔细研究并排除同义替换可能揭示重要进化信息的可能性之后,才应采用氨基酸模型。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验