统计比较核苷酸、氨基酸和密码子替换模型，用于蛋白质编码序列的进化分析。

Statistical comparison of nucleotide, amino acid, and codon substitution models for evolutionary analysis of protein-coding sequences.

机构信息

Professional Programme for Agricultural Bioinformatics, University of Tokyo, 1-1-1 Yayoi Bunkyo-Ku, Tokyo, Japan.

出版信息

Syst Biol. 2009 Apr;58(2):199-210. doi: 10.1093/sysbio/syp015. Epub 2009 Jun 29.

PMID:20525578

Abstract

Statistical models for the evolution of molecular sequences play an important role in the study of evolutionary processes. For the evolutionary analysis of protein-coding sequences, 3 types of evolutionary models are available: 1) nucleotide, 2) amino acid, and 3) codon substitution models. Selecting appropriate models can greatly improve the estimation of phylogenies and divergence times and the detection of positive selection. Although much attention has been paid to the comparisons among the same types of models, relatively little attention has been paid to the comparisons among the different types of models. Additionally, because such models have different data structures, comparison of those models using conventional model selection criteria such as Akaike information criterion (AIC) or Bayesian information criterion (BIC) is not straightforward. Here, we suggest new procedures to convert models of the above-mentioned 3 types to 64-dimensional models with nucleotide triplet substitution. These conversion procedures render it possible to statistically compare the models of these 3 types by using AIC or BIC. By analyzing divergent and conserved interspecific mammalian sequences and intraspecific human population data, we show the superiority of the codon substitution models and discuss the advantages and disadvantages of the models of the 3 types.

摘要

统计模型在分子序列的进化研究中起着重要作用。对于蛋白质编码序列的进化分析，有 3 种可用的进化模型：1）核苷酸，2）氨基酸和 3）密码子替代模型。选择合适的模型可以极大地提高系统发育和分歧时间的估计以及正选择的检测。尽管已经对同类型模型之间的比较给予了很多关注，但相对较少关注不同类型模型之间的比较。此外，由于这些模型具有不同的数据结构，因此使用传统的模型选择标准（如 Akaike 信息准则（AIC）或贝叶斯信息准则（BIC））来比较这些模型并不简单。在这里，我们建议了一种新的程序，将上述 3 种类型的模型转换为具有三核苷酸替代的 64 维模型。这些转换程序使得通过使用 AIC 或 BIC 可以对这 3 种模型进行统计学比较。通过分析不同和保守的种间哺乳动物序列和种内人类群体数据，我们展示了密码子替代模型的优越性，并讨论了这 3 种模型的优缺点。

相似文献

Statistical comparison of nucleotide, amino acid, and codon substitution models for evolutionary analysis of protein-coding sequences.

Syst Biol. 2009 Apr;58(2):199-210. doi: 10.1093/sysbio/syp015. Epub 2009 Jun 29.

Synonymous substitutions substantially improve evolutionary inference from highly diverged proteins.

Syst Biol. 2008 Jun;57(3):367-77. doi: 10.1080/10635150802158670.

The effect of branch length variation on the selection of models of molecular evolution.

J Mol Evol. 2001 May;52(5):434-44. doi: 10.1007/s002390010173.

Modelling the evolution of protein coding sequences sampled from Measurably Evolving Populations.

Genome Inform. 2008;21:150-64.

Nucleotide substitution rates for the full set of mitochondrial protein-coding genes in Coleoptera.

Mol Phylogenet Evol. 2010 Aug;56(2):796-807. doi: 10.1016/j.ympev.2010.02.007. Epub 2010 Feb 10.

Site-to-site variation of synonymous substitution rates.

Mol Biol Evol. 2005 Dec;22(12):2375-85. doi: 10.1093/molbev/msi232. Epub 2005 Aug 17.

A combined empirical and mechanistic codon model.

Mol Biol Evol. 2007 Feb;24(2):388-97. doi: 10.1093/molbev/msl175. Epub 2006 Nov 16.

An evolutionary model for protein-coding regions with conserved RNA structure.

Mol Biol Evol. 2004 Oct;21(10):1913-22. doi: 10.1093/molbev/msh199. Epub 2004 Jun 30.

A Bayesian model comparison approach to inferring positive selection.

Mol Biol Evol. 2005 Dec;22(12):2531-40. doi: 10.1093/molbev/msi250. Epub 2005 Aug 24.

Selection of models of DNA evolution with jModelTest.

Methods Mol Biol. 2009;537:93-112. doi: 10.1007/978-1-59745-251-9_5.

引用本文的文献

Presence of the snout beetle in Ecuador and potential invasion risk in South America.

Ecol Evol. 2023 Sep 19;13(9):e10531. doi: 10.1002/ece3.10531. eCollection 2023 Sep.

DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies.

Syst Biol. 2023 Nov 1;72(5):1119-1135. doi: 10.1093/sysbio/syad036.

Next-generation development and application of codon model in evolution.

Front Genet. 2023 Jan 27;14:1091575. doi: 10.3389/fgene.2023.1091575. eCollection 2023.

Ancestral Sequence Reconstruction for Exploring Alkaloid Evolution.

Methods Mol Biol. 2022;2505:165-179. doi: 10.1007/978-1-0716-2349-7_12.

Measuring Phylogenetic Information of Incomplete Sequence Data.

Syst Biol. 2022 Apr 19;71(3):630-648. doi: 10.1093/sysbio/syab073.

Genomic analysis uncovers functional variation in the C-terminus of anthocyanin-activating MYB transcription factors.

Hortic Res. 2021 Apr 1;8(1):77. doi: 10.1038/s41438-021-00514-1.

Characterization of the melanopsin gene (Opn4x) of diurnal and nocturnal snakes.

BMC Evol Biol. 2019 Aug 28;19(1):174. doi: 10.1186/s12862-019-1500-6.

Big data analysis of human mitochondrial DNA substitution models: a regression approach.

BMC Genomics. 2018 Oct 19;19(1):759. doi: 10.1186/s12864-018-5123-x.

Daily activity patterns influence retinal morphology, signatures of selection, and spectral tuning of opsin genes in colubrid snakes.

BMC Evol Biol. 2017 Dec 11;17(1):249. doi: 10.1186/s12862-017-1110-0.

Molecular phylogeny of four homeobox genes from the purple sea star Pisaster ochraceus.

Dev Genes Evol. 2015 Nov;225(6):359-65. doi: 10.1007/s00427-015-0516-1. Epub 2015 Oct 2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

统计比较核苷酸、氨基酸和密码子替换模型，用于蛋白质编码序列的进化分析。

Statistical comparison of nucleotide, amino acid, and codon substitution models for evolutionary analysis of protein-coding sequences.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献