基于 GC 和编码蛋白质序列嘌呤含量的组成动力学建模。

Modeling compositional dynamics based on GC and purine contents of protein-coding sequences.

机构信息

Plant Stress Genomics Research Center, Division of Chemical and Life Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia.

出版信息

Biol Direct. 2010 Nov 8;5:63. doi: 10.1186/1745-6150-5-63.

DOI:10.1186/1745-6150-5-63

PMID:21059261

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2989939/

Abstract

BACKGROUND

Understanding the compositional dynamics of genomes and their coding sequences is of great significance in gaining clues into molecular evolution and a large number of publically-available genome sequences have allowed us to quantitatively predict deviations of empirical data from their theoretical counterparts. However, the quantification of theoretical compositional variations for a wide diversity of genomes remains a major challenge.

RESULTS

To model the compositional dynamics of protein-coding sequences, we propose two simple models that take into account both mutation and selection effects, which act differently at the three codon positions, and use both GC and purine contents as compositional parameters. The two models concern the theoretical composition of nucleotides, codons, and amino acids, with no prerequisite of homologous sequences or their alignments. We evaluated the two models by quantifying theoretical compositions of a large collection of protein-coding sequences (including 46 of Archaea, 686 of Bacteria, and 826 of Eukarya), yielding consistent theoretical compositions across all the collected sequences.

CONCLUSIONS

We show that the compositions of nucleotides, codons, and amino acids are largely determined by both GC and purine contents and suggest that deviations of the observed from the expected compositions may reflect compositional signatures that arise from a complex interplay between mutation and selection via DNA replication and repair mechanisms.

摘要

背景

理解基因组及其编码序列的组成动态对于揭示分子进化的线索具有重要意义，并且大量公开的基因组序列使我们能够定量预测经验数据与理论数据之间的偏差。然而，对各种基因组的理论组成变化进行量化仍然是一个主要挑战。

结果

为了模拟蛋白质编码序列的组成动态，我们提出了两个简单的模型，这些模型考虑了突变和选择的影响，这些影响在三个密码子位置上的作用不同，并将 GC 和嘌呤含量用作组成参数。这两个模型涉及核苷酸、密码子和氨基酸的理论组成，不需要同源序列或其比对。我们通过量化大量蛋白质编码序列（包括 46 个古细菌、686 个细菌和 826 个真核生物）的理论组成来评估这两个模型，从而在所有收集的序列中产生一致的理论组成。

结论

我们表明，核苷酸、密码子和氨基酸的组成主要由 GC 和嘌呤含量决定，并表明观察到的组成与预期组成的偏差可能反映了由 DNA 复制和修复机制介导的突变和选择之间复杂相互作用产生的组成特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7bf/2989939/fb42ddad83d5/1745-6150-5-63-1.jpg

相似文献

Modeling compositional dynamics based on GC and purine contents of protein-coding sequences.基于 GC 和编码蛋白质序列嘌呤含量的组成动力学建模。

Biol Direct. 2010 Nov 8;5:63. doi: 10.1186/1745-6150-5-63.

A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes.一个基于突变和选择的简单模型解释了密码子和氨基酸使用的趋势以及基因组内部和之间的GC组成。

Genome Biol. 2001;2(4):RESEARCH0010. doi: 10.1186/gb-2001-2-4-research0010. Epub 2001 Mar 22.

Amino acids as placeholders: base-composition pressures on protein length in malaria parasites and prokaryotes.作为占位符的氨基酸：疟原虫和原核生物中蛋白质长度上的碱基组成压力

Appl Bioinformatics. 2005;4(2):117-30. doi: 10.2165/00822942-200504020-00005.

Compositional correlation studies among the three different codon positions in 12 bacterial genomes.12个细菌基因组中三个不同密码子位置之间的组成相关性研究。

Biochem Biophys Res Commun. 1999 Dec 9;266(1):66-71. doi: 10.1006/bbrc.1999.1774.

Prokaryotes that grow optimally in acid have purine-poor codons in long open reading frames.在酸性环境中生长最佳的原核生物，其长开放阅读框中含有嘌呤含量低的密码子。

Extremophiles. 2007 Jan;11(1):9-18. doi: 10.1007/s00792-006-0005-6. Epub 2006 Sep 7.

Natural selection retains overrepresented out-of-frame stop codons against frameshift peptides in prokaryotes.自然选择在原核生物中保留了大量的框架外终止密码子，以防止移码肽的产生。

BMC Genomics. 2010 Sep 9;11:491. doi: 10.1186/1471-2164-11-491.

A content-centric organization of the genetic code.一种以内容为中心的遗传密码组织方式。

Genomics Proteomics Bioinformatics. 2007 Feb;5(1):1-6. doi: 10.1016/S1672-0229(07)60008-4.

GC-biased gene conversion and selection affect GC content in the Oryza genus (rice).GC 偏向性基因转换和选择会影响稻属（水稻）中的 GC 含量。

Mol Biol Evol. 2011 Sep;28(9):2695-706. doi: 10.1093/molbev/msr104. Epub 2011 Apr 18.

Vertebrate codon bias indicates a highly GC-rich ancestral genome.脊椎动物密码子偏向性表明其祖先基因组富含 GC。

Gene. 2013 Apr 25;519(1):113-9. doi: 10.1016/j.gene.2013.01.033. Epub 2013 Jan 31.

Constraint on di-nucleotides by codon usage bias in bacterial genomes.细菌基因组中密码子使用偏好对二核苷酸的限制。

Gene. 2014 Feb 15;536(1):18-28. doi: 10.1016/j.gene.2013.11.098. Epub 2013 Dec 11.

引用本文的文献

Laws of Genome Nucleotide Composition.基因组核苷酸组成规律

Genomics Proteomics Bioinformatics. 2024 Oct 15;22(4). doi: 10.1093/gpbjnl/qzae061.

The Shift in Synonymous Codon Usage Reveals Similar Genomic Variation during Domestication of Asian and African Rice.同义密码子使用的转变揭示了亚洲和非洲稻驯化过程中的相似基因组变异。

Int J Mol Sci. 2022 Oct 25;23(21):12860. doi: 10.3390/ijms232112860.

CompoDynamics: a comprehensive database for characterizing sequence composition dynamics.CompoDynamics：用于描述序列组成动态的综合数据库。

Nucleic Acids Res. 2022 Jan 7;50(D1):D962-D969. doi: 10.1093/nar/gkab979.

Quantitative analysis of correlation between AT and GC biases among bacterial genomes.细菌基因组中AT与GC偏好性之间相关性的定量分析。

PLoS One. 2017 Feb 3;12(2):e0171408. doi: 10.1371/journal.pone.0171408. eCollection 2017.

Thermodynamic and kinetic stability of the Josephin Domain closed arrangement: evidences from replica exchange molecular dynamics.约瑟芬结构域封闭排列的热力学和动力学稳定性：来自副本交换分子动力学的证据

Biol Direct. 2017 Jan 19;12(1):2. doi: 10.1186/s13062-016-0173-y.

A novel skew analysis reveals substitution asymmetries linked to genetic code GC-biases and PolIII a-subunit isoforms.一种新型偏斜分析揭示了与遗传密码GC偏倚和PolIII α亚基异构体相关的替代不对称性。

DNA Res. 2016 Aug;23(4):353-63. doi: 10.1093/dnares/dsw021. Epub 2016 Jun 26.

Does the genetic code have a eukaryotic origin?遗传密码是否具有真核起源？

Genomics Proteomics Bioinformatics. 2013 Feb;11(1):41-55. doi: 10.1016/j.gpb.2013.01.001. Epub 2013 Jan 20.

The pendulum model for genome compositional dynamics: from the four nucleotides to the twenty amino acids.基因组组成动力学的钟摆模型：从四个核苷酸到二十个氨基酸。

Genomics Proteomics Bioinformatics. 2012 Aug;10(4):175-80. doi: 10.1016/j.gpb.2012.08.002. Epub 2012 Aug 11.

Replication-Associated Mutational Pressure (RMP) Governs Strand-Biased Compositional Asymmetry (SCA) and Gene Organization in Animal Mitochondrial Genomes.复制相关突变压力 (RMP) 控制动物线粒体基因组的链偏向组成不对称性 (SCA) 和基因组织。

Curr Genomics. 2012 Mar;13(1):28-36. doi: 10.2174/138920212799034811.

Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance.密码子偏向系数：一种估计密码子使用偏好及其统计显著性的新方法。

BMC Bioinformatics. 2012 Mar 22;13:43. doi: 10.1186/1471-2105-13-43.

本文引用的文献

Nucleotide substitution bias within the genus Drosophila affects the pattern of proteome evolution.种内果蝇核苷酸替换偏倚影响蛋白质组进化模式。

Genome Biol Evol. 2009 Aug 4;1:288-93. doi: 10.1093/gbe/evp028.

Forces that influence the evolution of codon bias.影响密码子偏好进化的力量。

Philos Trans R Soc Lond B Biol Sci. 2010 Apr 27;365(1544):1203-12. doi: 10.1098/rstb.2009.0305.

Mutation-selection models of coding sequence evolution with site-heterogeneous amino acid fitness profiles.带有位置异质氨基酸适合度分布的编码序列进化的突变-选择模型。

Proc Natl Acad Sci U S A. 2010 Mar 9;107(10):4629-34. doi: 10.1073/pnas.0910915107. Epub 2010 Feb 22.

Selection on codon bias.密码子偏好性选择。

Annu Rev Genet. 2008;42:287-99. doi: 10.1146/annurev.genet.42.110807.091442.

Investigating protein-coding sequence evolution with probabilistic codon substitution models.使用概率密码子替换模型研究蛋白质编码序列的进化。

Mol Biol Evol. 2009 Feb;26(2):255-71. doi: 10.1093/molbev/msn232. Epub 2008 Oct 14.

A scenario on the stepwise evolution of the genetic code.遗传密码逐步演变的一种设想。

Genomics Proteomics Bioinformatics. 2007 Dec;5(3-4):143-51. doi: 10.1016/S1672-0229(08)60001-7.

Codon evolution is governed by linear formulas.密码子进化受线性公式支配。

Amino Acids. 2008 May;34(4):661-8. doi: 10.1007/s00726-007-0024-3. Epub 2008 Jan 8.

Mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage.密码子替换的突变选择模型及其在估计密码子使用选择强度方面的应用。

Mol Biol Evol. 2008 Mar;25(3):568-79. doi: 10.1093/molbev/msm284. Epub 2008 Jan 3.

Codon usage in twelve species of Drosophila.十二种果蝇的密码子使用情况。

BMC Evol Biol. 2007 Nov 15;7:226. doi: 10.1186/1471-2148-7-226.

Amino acid and codon usage profiles: adaptive changes in the frequency of amino acids and codons.氨基酸和密码子使用概况：氨基酸和密码子频率的适应性变化。

Gene. 2008 Jan 15;407(1-2):30-41. doi: 10.1016/j.gene.2007.09.020. Epub 2007 Oct 5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于 GC 和编码蛋白质序列嘌呤含量的组成动力学建模。

Modeling compositional dynamics based on GC and purine contents of protein-coding sequences.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献