基于蛋白质折叠稳定性选择的最大似然系统发育推断

Maximum-Likelihood Phylogenetic Inference with Selection on Protein Folding Stability.

作者信息

Arenas Miguel, Sánchez-Cobos Agustin, Bastolla Ugo

机构信息

Department of Cell Biology and Immunology, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autónoma de Madrid, Madrid, Spain.

Department of Cell Biology and Immunology, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autónoma de Madrid, Madrid, Spain

出版信息

Mol Biol Evol. 2015 Aug;32(8):2195-207. doi: 10.1093/molbev/msv085. Epub 2015 Apr 2.

DOI:10.1093/molbev/msv085

PMID:25837579

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4833071/

Abstract

Despite intense work, incorporating constraints on protein native structures into the mathematical models of molecular evolution remains difficult, because most models and programs assume that protein sites evolve independently, whereas protein stability is maintained by interactions between sites. Here, we address this problem by developing a new mean-field substitution model that generates independent site-specific amino acid distributions with constraints on the stability of the native state against both unfolding and misfolding. The model depends on a background distribution of amino acids and one selection parameter that we fix maximizing the likelihood of the observed protein sequence. The analytic solution of the model shows that the main determinant of the site-specific distributions is the number of native contacts of the site and that the most variable sites are those with an intermediate number of native contacts. The mean-field models obtained, taking into account misfolded conformations, yield larger likelihood than models that only consider the native state, because their average hydrophobicity is more realistic, and they produce on the average stable sequences for most proteins. We evaluated the mean-field model with respect to empirical substitution models on 12 test data sets of different protein families. In all cases, the observed site-specific sequence profiles presented smaller Kullback-Leibler divergence from the mean-field distributions than from the empirical substitution model. Next, we obtained substitution rates combining the mean-field frequencies with an empirical substitution model. The resulting mean-field substitution model assigns larger likelihood than the empirical model to all studied families when we consider sequences with identity larger than 0.35, plausibly a condition that enforces conservation of the native structure across the family. We found that the mean-field model performs better than other structurally constrained models with similar or higher complexity. With respect to the much more complex model recently developed by Bordner and Mittelmann, which takes into account pairwise terms in the amino acid distributions and also optimizes the exchangeability matrix, our model performed worse for data with small sequence divergence but better for data with larger sequence divergence. The mean-field model has been implemented into the computer program Prot_Evol that is freely available at http://ub.cbm.uam.es/software/Prot_Evol.php.

摘要

尽管付出了巨大努力，但将蛋白质天然结构的限制因素纳入分子进化的数学模型仍然困难重重，因为大多数模型和程序都假定蛋白质位点是独立进化的，然而蛋白质的稳定性是由位点之间的相互作用维持的。在此，我们通过开发一种新的平均场替代模型来解决这一问题，该模型在考虑天然状态对解折叠和错误折叠稳定性的限制条件下，生成独立的位点特异性氨基酸分布。该模型依赖于氨基酸的背景分布和一个选择参数，我们通过最大化观察到的蛋白质序列的似然性来确定该参数。模型的解析解表明，位点特异性分布的主要决定因素是该位点的天然接触数，且变化最大的位点是那些具有中等数量天然接触的位点。考虑到错误折叠构象而得到的平均场模型，比仅考虑天然状态的模型具有更大的似然性，因为其平均疏水性更符合实际情况，并且它们平均能产生大多数蛋白质的稳定序列。我们在12个不同蛋白质家族的测试数据集上，针对经验替代模型评估了平均场模型。在所有情况下，观察到的位点特异性序列谱与平均场分布的库尔贝克-莱布勒散度，都小于与经验替代模型的散度。接下来，我们将平均场频率与经验替代模型相结合得到了替代率。当我们考虑序列一致性大于0.35的序列时，由此得到的平均场替代模型对所有研究家族赋予的似然性都大于经验模型，这可能是一个强制家族内天然结构保守的条件。我们发现，平均场模型比其他具有相似或更高复杂度的结构受限模型表现更好。对于Bordner和Mittelmann最近开发的更为复杂的模型，该模型考虑了氨基酸分布中的成对项并优化了交换性矩阵，我们的模型在序列差异较小的数据上表现较差，但在序列差异较大的数据上表现较好。平均场模型已被实现为计算机程序Prot_Evol，可在http://ub.cbm.uam.es/software/Prot_Evol.php免费获取。

相似文献

Maximum-Likelihood Phylogenetic Inference with Selection on Protein Folding Stability.基于蛋白质折叠稳定性选择的最大似然系统发育推断

Mol Biol Evol. 2015 Aug;32(8):2195-207. doi: 10.1093/molbev/msv085. Epub 2015 Apr 2.

Influence of mutation bias and hydrophobicity on the substitution rates and sequence entropies of protein evolution.突变偏好性和疏水性对蛋白质进化中替换率和序列熵的影响。

PeerJ. 2018 Oct 5;6:e5549. doi: 10.7717/peerj.5549. eCollection 2018.

The Influence of Protein Stability on Sequence Evolution: Applications to Phylogenetic Inference.蛋白质稳定性对序列进化的影响：在系统发育推断中的应用

Methods Mol Biol. 2019;1851:215-231. doi: 10.1007/978-1-4939-8736-8_11.

ProtASR: An Evolutionary Framework for Ancestral Protein Reconstruction with Selection on Folding Stability.ProtASR：一个通过对折叠稳定性进行选择来重建祖先蛋白质的进化框架。

Syst Biol. 2017 Nov 1;66(6):1054-1064. doi: 10.1093/sysbio/syw121.

A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank.一种具有独立位点的蛋白质进化模型，可从蛋白质数据库中重现位点特异性氨基酸分布。

BMC Evol Biol. 2006 May 31;6:43. doi: 10.1186/1471-2148-6-43.

Substitution Rates Predicted by Stability-Constrained Models of Protein Evolution Are Not Consistent with Empirical Data.蛋白质进化的稳定性约束模型预测的替换率与经验数据不一致。

Mol Biol Evol. 2018 Mar 1;35(3):743-755. doi: 10.1093/molbev/msx327.

An amino acid substitution-selection model adjusts residue fitness to improve phylogenetic estimation.氨基酸替换选择模型调整残基适合度以改进系统发育估计。

Mol Biol Evol. 2014 Apr;31(4):779-92. doi: 10.1093/molbev/msu044. Epub 2014 Jan 16.

Generality of the structurally constrained protein evolution model: assessment on representatives of the four main fold classes.结构受限蛋白质进化模型的通用性：对四大折叠类别的代表进行评估

Gene. 2005 Jan 17;345(1):45-53. doi: 10.1016/j.gene.2004.11.025. Epub 2004 Dec 24.

What's in a likelihood? Simple models of protein evolution and the contribution of structurally viable reconstructions to the likelihood.可能性包含什么？蛋白质进化的简单模型和结构可行重建对可能性的贡献。

Syst Biol. 2011 Mar;60(2):161-74. doi: 10.1093/sysbio/syq088. Epub 2011 Jan 12.

A new formulation of protein evolutionary models that account for structural constraints.一种新的蛋白质进化模型公式，该公式考虑了结构约束。

Mol Biol Evol. 2014 Mar;31(3):736-49. doi: 10.1093/molbev/mst240. Epub 2013 Dec 3.

引用本文的文献

Robustness of Ancestral Sequence Reconstruction to Among-site and Among-lineage Evolutionary Heterogeneity.祖先序列重建对位点间和谱系间进化异质性的稳健性。

Mol Biol Evol. 2025 Apr 1;42(4). doi: 10.1093/molbev/msaf084.

Selection among site-dependent structurally constrained substitution models of protein evolution by approximate Bayesian computation.基于近似贝叶斯计算的蛋白质进化中依赖于位置的结构约束替代模型的选择。

Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae096.

Substitution Models of Protein Evolution with Selection on Enzymatic Activity.蛋白质进化的替代模型与酶活性选择。

Mol Biol Evol. 2024 Feb 1;41(2). doi: 10.1093/molbev/msae026.

Ancestral sequence reconstruction as a tool to study the evolution of wood decaying fungi.祖先序列重建作为研究木材腐朽真菌进化的一种工具。

Front Fungal Biol. 2022 Oct 14;3:1003489. doi: 10.3389/ffunb.2022.1003489. eCollection 2022.

The evolution of the HIV-1 protease folding stability.HIV-1蛋白酶折叠稳定性的演变

Virus Evol. 2022 Dec 5;8(2):veac115. doi: 10.1093/ve/veac115. eCollection 2022.

Consequences of Genetic Recombination on Protein Folding Stability.遗传重组对蛋白质折叠稳定性的影响。

J Mol Evol. 2023 Feb;91(1):33-45. doi: 10.1007/s00239-022-10080-2. Epub 2022 Dec 3.

BESFA: bioinformatics based evolutionary, structural & functional analysis of prostrate, Placenta, Ovary, Testis, and Embryo (POTE) paralogs.BESFA：基于生物信息学对前列腺、胎盘、卵巢、睾丸和胚胎（POTE）旁系同源物进行进化、结构和功能分析。

Heliyon. 2022 Sep 5;8(9):e10476. doi: 10.1016/j.heliyon.2022.e10476. eCollection 2022 Sep.

Methodologies for Microbial Ancestral Sequence Reconstruction.微生物祖先序列重建方法。

Methods Mol Biol. 2022;2569:283-303. doi: 10.1007/978-1-0716-2691-7_14.

Reference genome assemblies reveal the origin and evolution of allohexaploid oat.参考基因组组装揭示了异源六倍体燕麦的起源和进化。

Nat Genet. 2022 Aug;54(8):1248-1258. doi: 10.1038/s41588-022-01127-7. Epub 2022 Jul 18.

HIV Protease and Integrase Empirical Substitution Models of Evolution: Protein-Specific Models Outperform Generalist Models.HIV 蛋白酶和整合酶经验替代进化模型：蛋白特异性模型优于通才模型。

Genes (Basel). 2021 Dec 27;13(1):61. doi: 10.3390/genes13010061.

本文引用的文献

A mechanistic stress model of protein evolution accounts for site-specific evolutionary rates and their relationship with packing density and flexibility.一种蛋白质进化的机械应力模型解释了特定部位的进化速率及其与包装密度和柔韧性的关系。

BMC Evol Biol. 2014 Apr 9;14:78. doi: 10.1186/1471-2148-14-78.

A new formulation of protein evolutionary models that account for structural constraints.一种新的蛋白质进化模型公式，该公式考虑了结构约束。

Mol Biol Evol. 2014 Mar;31(3):736-49. doi: 10.1093/molbev/mst240. Epub 2013 Dec 3.

Protein evolution along phylogenetic histories under structurally constrained substitution models.基于结构约束替代模型的系统发育历史中蛋白质的进化。

Bioinformatics. 2013 Dec 1;29(23):3020-8. doi: 10.1093/bioinformatics/btt530. Epub 2013 Sep 12.

MAFFT multiple sequence alignment software version 7: improvements in performance and usability.MAFFT 多序列比对软件版本 7：性能和易用性的改进。

Mol Biol Evol. 2013 Apr;30(4):772-80. doi: 10.1093/molbev/mst010. Epub 2013 Jan 16.

Detecting selection for negative design in proteins through an improved model of the misfolded state.通过改进的错误折叠状态模型检测蛋白质中的负设计选择。

Proteins. 2013 Jul;81(7):1102-12. doi: 10.1002/prot.24244. Epub 2013 Apr 10.

Bringing molecules back into molecular evolution.将分子带回分子进化中。

PLoS Comput Biol. 2012;8(6):e1002572. doi: 10.1371/journal.pcbi.1002572. Epub 2012 Jun 28.

The interface of protein structure, protein biophysics, and molecular evolution.蛋白质结构、蛋白质生物物理学和分子进化的界面。

Protein Sci. 2012 Jun;21(6):769-85. doi: 10.1002/pro.2071. Epub 2012 Apr 23.

Biophysical and structural considerations for protein sequence evolution.蛋白质序列进化的生物物理和结构考虑因素。

BMC Evol Biol. 2011 Dec 16;11:361. doi: 10.1186/1471-2148-11-361.

The Pfam protein families database.Pfam 蛋白质家族数据库。

Nucleic Acids Res. 2012 Jan;40(Database issue):D290-301. doi: 10.1093/nar/gkr1065. Epub 2011 Nov 29.

Cooperativity, local-nonlocal coupling, and nonnative interactions: principles of protein folding from coarse-grained models.协同性、局部-非局部耦合与非天然相互作用：基于粗粒度模型的蛋白质折叠原理

Annu Rev Phys Chem. 2011;62:301-26. doi: 10.1146/annurev-physchem-032210-103405.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验