Suppr超能文献

三种简单性质解释了突变对蛋白质稳定性的影响。

Three Simple Properties Explain Protein Stability Change upon Mutation.

机构信息

DTU Chemistry, Technical University of Denmark, Building 206, 2800 Kgs. Lyngby, Denmark.

Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, United Kingdom.

出版信息

J Chem Inf Model. 2021 Apr 26;61(4):1981-1988. doi: 10.1021/acs.jcim.1c00201. Epub 2021 Apr 13.

Abstract

Accurate prediction of protein stability upon mutation enables rational engineering of new proteins and insights into protein evolution and monogenetic diseases caused by single-point amino acid substitutions. Many tools have been developed to this aim, ranging from energy-based models to machine-learning methods that use large amounts of experimental data. However, as the methods become more complex, the interpretation of the chemistry underlying the protein stability effects becomes obscure. It is thus of interest to identify the simplest prediction model that retains complete amino acid specific interpretation; for a given number of input descriptors, we expect such a model to be almost universal. In this study, we identify such a limiting model, SimBa, a simple multilinear regression model trained on a substitution-type-balanced experimental data set. The model accounts only for the solvent accessibility of the site, volume difference, and polarity difference caused by mutation. Our results show that this very simple and directly applicable model performs comparably to other much more complex, widely used protein stability prediction methods. This suggests that a hard limit of ∼1 kcal/mol numerical accuracy and an ∼ 0.5 trend accuracy exists and that new features, such as account of unfolded states, water colocalization, and amino acid correlations, are required to improve accuracy to, e.g., 1/2 kcal/mol.

摘要

准确预测蛋白质突变后的稳定性可以实现新蛋白质的理性工程设计,并深入了解由单点氨基酸取代引起的蛋白质进化和单基因疾病。为此目的已经开发了许多工具,从基于能量的模型到使用大量实验数据的机器学习方法。然而,随着方法变得更加复杂,蛋白质稳定性影响背后的化学解释变得模糊。因此,确定保留完整的氨基酸特异性解释的最简单预测模型很有意义;对于给定数量的输入描述符,我们期望这样的模型几乎是通用的。在这项研究中,我们确定了这样一个限制模型,SimBa,这是一个基于取代类型平衡实验数据集训练的简单多元线性回归模型。该模型仅考虑突变引起的位点溶剂可及性、体积差异和极性差异。我们的结果表明,这个非常简单且直接适用的模型与其他更复杂、广泛使用的蛋白质稳定性预测方法的性能相当。这表明存在一个约 1 kcal/mol 的数值精度和一个约 0.5 的趋势精度的硬性限制,并且需要新的特征,例如展开状态、水共定位和氨基酸相关性的考虑,以将精度提高到例如 1/2 kcal/mol。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验