• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

迈向平衡蛋白质稳定性数据集的汇编:通过系统富集使ΔΔ曲线变平缓

Toward Compilation of Balanced Protein Stability Data Sets: Flattening the ΔΔ Curve through Systematic Enrichment.

作者信息

Kebabci Narod, Timucin Ahmet Can, Timucin Emel

机构信息

Department of Biostatistics and Bioinformatics, Institute of Health Sciences, Acibadem University, Istanbul 34752, Turkey.

Department of Molecular Biology and Genetics, Faculty of Arts and Sciences, Acibadem University, Istanbul 34752, Turkey.

出版信息

J Chem Inf Model. 2022 Mar 14;62(5):1345-1355. doi: 10.1021/acs.jcim.2c00054. Epub 2022 Feb 24.

DOI:10.1021/acs.jcim.2c00054
PMID:35201762
Abstract

Often studies analyzing stability data sets and/or predictors ignore neutral mutations and use a binary classification scheme labeling only destabilizing and stabilizing mutations. Recognizing that highly concentrated neutral mutations interfere with data set quality, we have explored three protein stability data sets: S2648, PON-tstab, and the symmetric S that differ in size and quality. A characteristic leptokurtic shape in the ΔΔ distributions of all three data sets including the curated and symmetric ones was reported due to concentrated neutral mutations. To further investigate the impact of neutral mutations on ΔΔ predictions, we have comprehensively assessed the performance of 11 predictors on the PON-tstab data set. Correlation and error analyses showed that all of the predictors performed the best on the neutral mutations, while their performance became gradually worse as the ΔΔ of the mutations departed further from the neutral zone regardless of the direction, implying a bias toward dense mutations. To this end, after unraveling the role of concentrated neutral mutations in biases of stability data sets, we described a systematic enrichment approach to balance the ΔΔ distributions. Before enrichment, mutations were clustered based on their biochemical and/or structural features, and then three mutations were selected from every 2 kcal/mol of each cluster. Upon implementation of this approach by distinct clustering schemes, we generated five subsets varying in size and ΔΔ distributions. All subsets showed improved ΔΔ and frequency distributions. We ultimately reported that the errors toward enriched subsets were higher than those toward the parent data sets, confirming the enrichment of difficult-to-predict mutations in the subsets. In summary, we elaborated the prediction bias toward a concentrated neutral zone and also implemented a rational strategy to tackle this and other forms of biases. Ultimately, this study equipping us with an extended view of shortcomings of stability data sets is a step taken toward development of an unbiased predictor.

摘要

通常,分析稳定性数据集和/或预测因子的研究忽略中性突变,并使用仅标记不稳定和稳定突变的二元分类方案。认识到高度集中的中性突变会干扰数据集质量,我们探索了三个蛋白质稳定性数据集:S2648、PON-tstab和对称S,它们在大小和质量上有所不同。由于集中的中性突变,所有三个数据集(包括经过整理的和对称的数据集)的ΔΔ分布都呈现出典型的尖峰态形状。为了进一步研究中性突变对ΔΔ预测的影响,我们全面评估了11个预测因子在PON-tstab数据集上的性能。相关性和误差分析表明,所有预测因子在中性突变上表现最佳,而随着突变的ΔΔ值无论方向如何离中性区越远,它们的性能逐渐变差,这意味着对密集突变存在偏差。为此,在揭示集中的中性突变在稳定性数据集偏差中的作用后,我们描述了一种系统的富集方法来平衡ΔΔ分布。在富集之前,根据突变的生化和/或结构特征对其进行聚类,然后从每个聚类的每2千卡/摩尔中选择三个突变。通过不同的聚类方案实施该方法后,我们生成了五个大小和ΔΔ分布不同的子集。所有子集的ΔΔ和频率分布都有所改善。我们最终报告称,富集子集的误差高于母数据集的误差,证实了子集中难以预测的突变得到了富集。总之,我们阐述了对集中中性区的预测偏差,并实施了一种合理的策略来解决这种偏差以及其他形式的偏差。最终,这项让我们对稳定性数据集的缺点有更广泛认识的研究是朝着开发无偏差预测因子迈出的一步。

相似文献

1
Toward Compilation of Balanced Protein Stability Data Sets: Flattening the ΔΔ Curve through Systematic Enrichment.迈向平衡蛋白质稳定性数据集的汇编:通过系统富集使ΔΔ曲线变平缓
J Chem Inf Model. 2022 Mar 14;62(5):1345-1355. doi: 10.1021/acs.jcim.2c00054. Epub 2022 Feb 24.
2
A three-state prediction of single point mutations on protein stability changes.蛋白质稳定性变化单点突变的三态预测。
BMC Bioinformatics. 2008 Mar 26;9 Suppl 2(Suppl 2):S6. doi: 10.1186/1471-2105-9-S2-S6.
3
Data set and fitting dependencies when estimating protein mutant stability: Toward simple, balanced, and interpretable models.在估计蛋白质突变体稳定性时的数据集合和拟合相关性:朝着简单、平衡和可解释的模型发展。
J Comput Chem. 2022 Mar 30;43(8):504-518. doi: 10.1002/jcc.26810. Epub 2022 Jan 18.
4
Large scale analysis of protein stability in OMIM disease related human protein variants.在线人类孟德尔遗传数据库(OMIM)疾病相关人类蛋白质变体的蛋白质稳定性大规模分析。
BMC Genomics. 2016 Jun 23;17 Suppl 2(Suppl 2):397. doi: 10.1186/s12864-016-2726-y.
5
Performance of Web tools for predicting changes in protein stability caused by mutations.用于预测突变引起的蛋白质稳定性变化的网络工具的性能。
BMC Bioinformatics. 2021 Jul 5;22(Suppl 7):345. doi: 10.1186/s12859-021-04238-w.
6
Prediction of protein mutant stability using classification and regression tool.使用分类与回归工具预测蛋白质突变体稳定性
Biophys Chem. 2007 Feb;125(2-3):462-70. doi: 10.1016/j.bpc.2006.10.009. Epub 2006 Nov 20.
7
Predicting changes in protein thermodynamic stability upon point mutation with deep 3D convolutional neural networks.利用深度 3D 卷积神经网络预测点突变对蛋白质热力学稳定性的影响。
PLoS Comput Biol. 2020 Nov 30;16(11):e1008291. doi: 10.1371/journal.pcbi.1008291. eCollection 2020 Nov.
8
Selection maintaining protein stability at equilibrium.在平衡状态下维持蛋白质稳定性的选择。
J Theor Biol. 2016 Feb 21;391:21-34. doi: 10.1016/j.jtbi.2015.12.001. Epub 2015 Dec 8.
9
Challenges in predicting stabilizing variations: An exploration.预测稳定变异的挑战:一项探索
Front Mol Biosci. 2023 Jan 5;9:1075570. doi: 10.3389/fmolb.2022.1075570. eCollection 2022.
10
A natural upper bound to the accuracy of predicting protein stability changes upon mutations.一种预测蛋白质突变稳定性变化的自然上限精度。
Bioinformatics. 2019 May 1;35(9):1513-1517. doi: 10.1093/bioinformatics/bty880.

引用本文的文献

1
Rational design of monomeric IL37 variants guided by stability and dynamical analyses of IL37 dimers.基于IL37二聚体稳定性和动力学分析的单体IL37变体的合理设计。
Comput Struct Biotechnol J. 2024 Apr 22;23:1854-1863. doi: 10.1016/j.csbj.2024.04.037. eCollection 2024 Dec.
2
Evaluation of AlphaFold structure-based protein stability prediction on missense variations in cancer.基于AlphaFold结构的蛋白质稳定性预测对癌症错义变异的评估
Front Genet. 2023 Feb 21;14:1052383. doi: 10.3389/fgene.2023.1052383. eCollection 2023.