Suppr超能文献

严格评估和整合基于序列和结构的特征,以预测热点。

Rigorous assessment and integration of the sequence and structure based features to predict hot spots.

机构信息

1College of Life Sciences, Graduate University of Chinese Academy ofSciences, Beijing 100049, China.

出版信息

BMC Bioinformatics. 2011 Jul 29;12:311. doi: 10.1186/1471-2105-12-311.

Abstract

BACKGROUND

Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need.

RESULTS

In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes.

CONCLUSION

Experimental results show that support vector machine classifiers are quite effective in predicting hot spots based on sequence features. Hot spots cannot be fully predicted through simple analysis based on physicochemical characteristics, but there is reason to believe that integration of features and machine learning methods can remarkably improve the predictive performance for hot spots.

摘要

背景

系统的诱变研究表明,只有少数被称为热点的界面残基对蛋白质-蛋白质相互作用的结合自由能有显著贡献。因此,热点预测对于深入了解蛋白质相互作用的本质和帮助缩小药物设计的搜索空间变得越来越重要。目前已经开发了许多通过提出不同特征的计算方法。然而,这些特征的比较评估以及更有效和准确的方法仍然迫切需要。

结果

在这项研究中,我们首先全面收集了区分热点和非热点的特征,并分析了它们的分布。我们发现热点的相对溶剂可及表面积(relASA)较低,相对溶剂可及表面积变化较大,表明热点倾向于免受体相溶剂的保护。此外,热点有更多的接触,包括氢键、盐桥和原子接触,有利于复合物的形成。有趣的是,我们发现在 Ab+数据集(所有复合物)中,热点和非热点之间的保守评分和序列熵没有显著差异。而在 Ab-数据集(排除抗原-抗体复合物)中,热点和非热点之间的两个特征存在显著差异。其次,我们通过支持向量机(SVM)探索了每个特征及其特征组合的预测能力。结果表明,基于序列的特征以合理的精度优于其他特征组合,在独立测试集上的精度为 0.69、召回率为 0.68、F1 得分为 0.68 和 AUC 为 0.68。与其他机器学习方法和两种基于能量的方法相比,我们的方法取得了最佳性能。此外,我们证明了我们的方法在预测两个蛋白质复合物的热点方面的适用性。

结论

实验结果表明,支持向量机分类器在基于序列特征预测热点方面非常有效。仅通过基于物理化学特性的简单分析,热点不能被完全预测,但有理由相信,特征和机器学习方法的整合可以显著提高热点的预测性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9df4/3176265/895d80b8d2c1/1471-2105-12-311-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验