一种具有独立位点的蛋白质进化模型，可从蛋白质数据库中重现位点特异性氨基酸分布。

A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank.

作者信息

Bastolla Ugo, Porto Markus, Roman H Eduardo, Vendruscolo Michele

机构信息

Centro de Biología Molecular Severo Ochoa, CSIC-UAM, Cantoblanco, 28049 Madrid, Spain.

出版信息

BMC Evol Biol. 2006 May 31;6:43. doi: 10.1186/1471-2148-6-43.

DOI:10.1186/1471-2148-6-43

PMID:16737532

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1570368/

Abstract

BACKGROUND

Since thermodynamic stability is a global property of proteins that has to be conserved during evolution, the selective pressure at a given site of a protein sequence depends on the amino acids present at other sites. However, models of molecular evolution that aim at reconstructing the evolutionary history of macromolecules become computationally intractable if such correlations between sites are explicitly taken into account.

RESULTS

We introduce an evolutionary model with sites evolving independently under a global constraint on the conservation of structural stability. This model consists of a selection process, which depends on two hydrophobicity parameters that can be computed from protein sequences without any fit, and a mutation process for which we consider various models. It reproduces quantitatively the results of Structurally Constrained Neutral (SCN) simulations of protein evolution in which the stability of the native state is explicitly computed and conserved. We then compare the predicted site-specific amino acid distributions with those sampled from the Protein Data Bank (PDB). The parameters of the mutation model, whose number varies between zero and five, are fitted from the data. The mean correlation coefficient between predicted and observed site-specific amino acid distributions is larger than = 0.70 for a mutation model with no free parameters and no genetic code. In contrast, considering only the mutation process with no selection yields a mean correlation coefficient of = 0.56 with three fitted parameters. The mutation model that best fits the data takes into account increased mutation rate at CpG dinucleotides, yielding = 0.90 with five parameters.

CONCLUSION

The effective selection process that we propose reproduces well amino acid distributions as observed in the protein sequences in the PDB. Its simplicity makes it very promising for likelihood calculations in phylogenetic studies. Interestingly, in this approach the mutation process influences the effective selection process, i.e. selection and mutation must be entangled in order to obtain effectively independent sites. This interdependence between mutation and selection reflects the deep influence that mutation has on the evolutionary process: The bias in the mutation influences the thermodynamic properties of the evolving proteins, in agreement with comparative studies of bacterial proteomes, and it also influences the rate of accepted mutations.

摘要

背景

由于热力学稳定性是蛋白质的一种全局属性，在进化过程中必须得以保留，因此蛋白质序列中给定位点的选择压力取决于其他位点所存在的氨基酸。然而，如果明确考虑位点之间的这种相关性，旨在重建大分子进化历史的分子进化模型在计算上就会变得难以处理。

结果

我们引入了一种进化模型，其中位点在结构稳定性守恒的全局约束下独立进化。该模型由一个选择过程和一个突变过程组成，选择过程取决于两个可从蛋白质序列计算得出且无需任何拟合的疏水性参数，对于突变过程我们考虑了各种模型。它定量地再现了蛋白质进化的结构约束中性（SCN）模拟结果，在该模拟中明确计算并保留了天然状态的稳定性。然后我们将预测的位点特异性氨基酸分布与从蛋白质数据库（PDB）中采样得到的分布进行比较。突变模型的参数数量在零到五个之间，这些参数是根据数据进行拟合的。对于一个没有自由参数且没有遗传密码的突变模型，预测的和观察到的位点特异性氨基酸分布之间的平均相关系数大于 = 0.70。相比之下，仅考虑没有选择的突变过程，在有三个拟合参数的情况下，平均相关系数为 = 0.56。最符合数据的突变模型考虑了CpG二核苷酸处增加的突变率，在有五个参数的情况下，得到 = 0.90。

结论

我们提出的有效选择过程能够很好地再现如在PDB中的蛋白质序列中所观察到的氨基酸分布。其简单性使其在系统发育研究中的似然计算方面非常有前景。有趣的是，在这种方法中，突变过程会影响有效选择过程，即选择和突变必须相互交织才能获得有效的独立位点。突变与选择之间的这种相互依赖性反映了突变对进化过程的深刻影响：突变偏差会影响正在进化的蛋白质的热力学性质，这与细菌蛋白质组的比较研究一致，并且它还会影响被接受的突变率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d051/1570368/517e9ac2a3a5/1471-2148-6-43-1.jpg

相似文献

A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank.一种具有独立位点的蛋白质进化模型，可从蛋白质数据库中重现位点特异性氨基酸分布。

BMC Evol Biol. 2006 May 31;6:43. doi: 10.1186/1471-2148-6-43.

Looking at structure, stability, and evolution of proteins through the principal eigenvector of contact matrices and hydrophobicity profiles.通过接触矩阵的主特征向量和疏水性图谱来研究蛋白质的结构、稳定性及进化。

Gene. 2005 Mar 14;347(2):219-30. doi: 10.1016/j.gene.2004.12.015.

Prediction of site-specific amino acid distributions and limits of divergent evolutionary changes in protein sequences.蛋白质序列中位点特异性氨基酸分布的预测及分歧进化变化的限度

Mol Biol Evol. 2005 Mar;22(3):630-8. doi: 10.1093/molbev/msi048. Epub 2004 Nov 10.

Effective connectivity profile: a structural representation that evidences the relationship between protein structures and sequences.有效连接性概况：一种证明蛋白质结构与序列之间关系的结构表示。

Proteins. 2008 Dec;73(4):872-88. doi: 10.1002/prot.22113.

Natural selection for kinetic stability is a likely origin of correlations between mutational effects on protein energetics and frequencies of amino acid occurrences in sequence alignments.对动力学稳定性的自然选择可能是序列比对中突变对蛋白质能量学的影响与氨基酸出现频率之间相关性的一个起源。

J Mol Biol. 2006 Oct 6;362(5):966-78. doi: 10.1016/j.jmb.2006.07.065. Epub 2006 Jul 31.

Maximum-Likelihood Phylogenetic Inference with Selection on Protein Folding Stability.基于蛋白质折叠稳定性选择的最大似然系统发育推断

Mol Biol Evol. 2015 Aug;32(8):2195-207. doi: 10.1093/molbev/msv085. Epub 2015 Apr 2.

Generality of the structurally constrained protein evolution model: assessment on representatives of the four main fold classes.结构受限蛋白质进化模型的通用性：对四大折叠类别的代表进行评估

Gene. 2005 Jan 17;345(1):45-53. doi: 10.1016/j.gene.2004.11.025. Epub 2004 Dec 24.

Local interactions in protein folding determined through an inverse folding model.通过反向折叠模型确定蛋白质折叠中的局部相互作用。

Proteins. 2008 Apr;71(1):278-99. doi: 10.1002/prot.21730.

A new formulation of protein evolutionary models that account for structural constraints.一种新的蛋白质进化模型公式，该公式考虑了结构约束。

Mol Biol Evol. 2014 Mar;31(3):736-49. doi: 10.1093/molbev/mst240. Epub 2013 Dec 3.

Effects of side-chain characteristics on stability and oligomerization state of a de novo-designed model coiled-coil: 20 amino acid substitutions in position "d".侧链特性对全新设计的模型卷曲螺旋的稳定性和寡聚化状态的影响：“d”位的20个氨基酸替换

J Mol Biol. 2000 Jul 7;300(2):377-402. doi: 10.1006/jmbi.2000.3866.

引用本文的文献

SARS-CoV-2 biological clones are genetically heterogeneous and include clade-discordant residues.严重急性呼吸综合征冠状病毒2（SARS-CoV-2）生物克隆在基因上是异质的，并且包含进化枝不一致的残基。

J Virol. 2025 May 20;99(5):e0225024. doi: 10.1128/jvi.02250-24. Epub 2025 Apr 24.

Substitution Models of Protein Evolution with Selection on Enzymatic Activity.蛋白质进化的替代模型与酶活性选择。

Mol Biol Evol. 2024 Feb 1;41(2). doi: 10.1093/molbev/msae026.

Consequences of Genetic Recombination on Protein Folding Stability.遗传重组对蛋白质折叠稳定性的影响。

J Mol Evol. 2023 Feb;91(1):33-45. doi: 10.1007/s00239-022-10080-2. Epub 2022 Dec 3.

Methodologies for Microbial Ancestral Sequence Reconstruction.微生物祖先序列重建方法。

Methods Mol Biol. 2022;2569:283-303. doi: 10.1007/978-1-0716-2691-7_14.

Modeling Structural Constraints on Protein Evolution via Side-Chain Conformational States.通过侧链构象状态对蛋白质进化进行结构约束建模。

Mol Biol Evol. 2019 Sep 1;36(9):2086-2103. doi: 10.1093/molbev/msz122.

Influence of mutation bias and hydrophobicity on the substitution rates and sequence entropies of protein evolution.突变偏好性和疏水性对蛋白质进化中替换率和序列熵的影响。

PeerJ. 2018 Oct 5;6:e5549. doi: 10.7717/peerj.5549. eCollection 2018.

Molecular and Functional Bases of Selection against a Mutation Bias in an RNA Virus.RNA病毒中针对突变偏向性选择的分子与功能基础

Genome Biol Evol. 2017 May 1;9(5):1212-1228. doi: 10.1093/gbe/evx075.

Biophysical Models of Protein Evolution: Understanding the Patterns of Evolutionary Sequence Divergence.蛋白质进化的生物物理模型：理解进化序列分歧模式

Annu Rev Biophys. 2017 May 22;46:85-103. doi: 10.1146/annurev-biophys-070816-033819. Epub 2017 Mar 15.

Maximum-Likelihood Phylogenetic Inference with Selection on Protein Folding Stability.基于蛋白质折叠稳定性选择的最大似然系统发育推断

Mol Biol Evol. 2015 Aug;32(8):2195-207. doi: 10.1093/molbev/msv085. Epub 2015 Apr 2.

Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently.定向进化蛋白质生物催化剂的合成生物学：智能导航序列空间。

Chem Soc Rev. 2015 Mar 7;44(5):1172-239. doi: 10.1039/c4cs00351a.

本文引用的文献

Evolutionary information for specifying a protein fold.用于确定蛋白质折叠的进化信息。

Nature. 2005 Sep 22;437(7058):512-8. doi: 10.1038/nature03991.

Stability constraints and protein evolution: the role of chain length, composition and disulfide bonds.稳定性限制与蛋白质进化：链长、组成及二硫键的作用

Protein Eng Des Sel. 2005 Sep;18(9):405-15. doi: 10.1093/protein/gzi045. Epub 2005 Aug 5.

The application of statistical physics to evolutionary biology.统计物理学在进化生物学中的应用。

Proc Natl Acad Sci U S A. 2005 Jul 5;102(27):9541-6. doi: 10.1073/pnas.0501865102. Epub 2005 Jun 24.

Gene. 2005 Mar 14;347(2):219-30. doi: 10.1016/j.gene.2004.12.015.

Site interdependence attributed to tertiary structure in amino acid sequence evolution.氨基酸序列进化中归因于三级结构的位点相互依赖性。

Gene. 2005 Mar 14;347(2):207-17. doi: 10.1016/j.gene.2004.12.011. Epub 2005 Feb 19.

A universal trend of amino acid gain and loss in protein evolution.蛋白质进化过程中氨基酸增减的普遍趋势。

Nature. 2005 Feb 10;433(7026):633-8. doi: 10.1038/nature03306. Epub 2005 Jan 19.

Mol Biol Evol. 2005 Mar;22(3):630-8. doi: 10.1093/molbev/msi048. Epub 2004 Nov 10.

Principal eigenvector of contact matrices and hydrophobicity profiles in proteins.蛋白质中接触矩阵和疏水性图谱的主特征向量。

Proteins. 2005 Jan 1;58(1):22-30. doi: 10.1002/prot.20240.

Adaptive evolution of transcription factor binding sites.转录因子结合位点的适应性进化。

BMC Evol Biol. 2004 Oct 28;4:42. doi: 10.1186/1471-2148-4-42.

The structurally constrained protein evolution model accounts for sequence patterns of the LbetaH superfamily.结构受限的蛋白质进化模型解释了LbetaH超家族的序列模式。

BMC Evol Biol. 2004 Oct 22;4:41. doi: 10.1186/1471-2148-4-41.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种具有独立位点的蛋白质进化模型，可从蛋白质数据库中重现位点特异性氨基酸分布。

A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献