严格评估和整合基于序列和结构的特征，以预测热点。

Rigorous assessment and integration of the sequence and structure based features to predict hot spots.

机构信息

1College of Life Sciences, Graduate University of Chinese Academy ofSciences, Beijing 100049, China.

出版信息

BMC Bioinformatics. 2011 Jul 29;12:311. doi: 10.1186/1471-2105-12-311.

DOI:10.1186/1471-2105-12-311

PMID:21798070

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3176265/

Abstract

BACKGROUND

Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need.

RESULTS

In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes.

CONCLUSION

Experimental results show that support vector machine classifiers are quite effective in predicting hot spots based on sequence features. Hot spots cannot be fully predicted through simple analysis based on physicochemical characteristics, but there is reason to believe that integration of features and machine learning methods can remarkably improve the predictive performance for hot spots.

摘要

背景

系统的诱变研究表明，只有少数被称为热点的界面残基对蛋白质-蛋白质相互作用的结合自由能有显著贡献。因此，热点预测对于深入了解蛋白质相互作用的本质和帮助缩小药物设计的搜索空间变得越来越重要。目前已经开发了许多通过提出不同特征的计算方法。然而，这些特征的比较评估以及更有效和准确的方法仍然迫切需要。

结果

在这项研究中，我们首先全面收集了区分热点和非热点的特征，并分析了它们的分布。我们发现热点的相对溶剂可及表面积（relASA）较低，相对溶剂可及表面积变化较大，表明热点倾向于免受体相溶剂的保护。此外，热点有更多的接触，包括氢键、盐桥和原子接触，有利于复合物的形成。有趣的是，我们发现在 Ab+数据集（所有复合物）中，热点和非热点之间的保守评分和序列熵没有显著差异。而在 Ab-数据集（排除抗原-抗体复合物）中，热点和非热点之间的两个特征存在显著差异。其次，我们通过支持向量机（SVM）探索了每个特征及其特征组合的预测能力。结果表明，基于序列的特征以合理的精度优于其他特征组合，在独立测试集上的精度为 0.69、召回率为 0.68、F1 得分为 0.68 和 AUC 为 0.68。与其他机器学习方法和两种基于能量的方法相比，我们的方法取得了最佳性能。此外，我们证明了我们的方法在预测两个蛋白质复合物的热点方面的适用性。

结论

实验结果表明，支持向量机分类器在基于序列特征预测热点方面非常有效。仅通过基于物理化学特性的简单分析，热点不能被完全预测，但有理由相信，特征和机器学习方法的整合可以显著提高热点的预测性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9df4/3176265/895d80b8d2c1/1471-2105-12-311-1.jpg

相似文献

Rigorous assessment and integration of the sequence and structure based features to predict hot spots.

BMC Bioinformatics. 2011 Jul 29;12:311. doi: 10.1186/1471-2105-12-311.

APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility.

BMC Bioinformatics. 2010 Apr 8;11:174. doi: 10.1186/1471-2105-11-174.

Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods.

BMC Bioinformatics. 2009 Oct 30;10:365. doi: 10.1186/1471-2105-10-365.

A feature-based approach to predict hot spots in protein-DNA binding interfaces.

Brief Bioinform. 2020 May 21;21(3):1038-1046. doi: 10.1093/bib/bbz037.

A semi-supervised boosting SVM for predicting hot spots at protein-protein interfaces.

BMC Syst Biol. 2012;6 Suppl 2(Suppl 2):S6. doi: 10.1186/1752-0509-6-S2-S6. Epub 2012 Dec 12.

Computationally identifying hot spots in protein-DNA binding interfaces using an ensemble approach.

BMC Bioinformatics. 2020 Sep 17;21(Suppl 13):384. doi: 10.1186/s12859-020-03675-3.

Enhanced Prediction of Hot Spots at Protein-Protein Interfaces Using Extreme Gradient Boosting.

Sci Rep. 2018 Sep 24;8(1):14285. doi: 10.1038/s41598-018-32511-1.

Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy.

Bioinformatics. 2009 Jun 15;25(12):1513-20. doi: 10.1093/bioinformatics/btp240. Epub 2009 Apr 8.

Protein-protein interface hot spots prediction based on a hybrid feature selection strategy.

BMC Bioinformatics. 2018 Jan 15;19(1):14. doi: 10.1186/s12859-018-2009-5.

Machine Learning Approaches for Protein⁻Protein Interaction Hot Spot Prediction: Progress and Comparative Assessment.

Molecules. 2018 Oct 4;23(10):2535. doi: 10.3390/molecules23102535.

引用本文的文献

Computational identification of epitopes in the glycoproteins of novel bunyavirus (SFTS virus) recognized by a human monoclonal antibody (MAb 4-5).

J Comput Aided Mol Des. 2013 Jun;27(6):539-50. doi: 10.1007/s10822-013-9661-7. Epub 2013 Jul 10.

A simple recipe for the non-expert bioinformaticist for building experimentally-testable hypotheses for proteins with no known homologs.

J Struct Funct Genomics. 2012 Dec;13(4):185-200. doi: 10.1007/s10969-012-9141-7. Epub 2012 Sep 7.

本文引用的文献

Simple sequence-based kernels do not predict protein-protein interactions.

Bioinformatics. 2010 Oct 15;26(20):2610-4. doi: 10.1093/bioinformatics/btq483. Epub 2010 Aug 27.

Prediction of protein-protein interaction sites using an ensemble method.

BMC Bioinformatics. 2009 Dec 16;10:426. doi: 10.1186/1471-2105-10-426.

PCRPi: Presaging Critical Residues in Protein interfaces, a new computational tool to chart hot spots in protein interfaces.

Nucleic Acids Res. 2010 Apr;38(6):e86. doi: 10.1093/nar/gkp1158. Epub 2009 Dec 11.

Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods.

BMC Bioinformatics. 2009 Oct 30;10:365. doi: 10.1186/1471-2105-10-365.

Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy.

Bioinformatics. 2009 Jun 15;25(12):1513-20. doi: 10.1093/bioinformatics/btp240. Epub 2009 Apr 8.

Progress and challenges in predicting protein-protein interaction sites.

Brief Bioinform. 2009 May;10(3):233-46. doi: 10.1093/bib/bbp021. Epub 2009 Apr 3.

A feature-based approach to modeling protein-protein interaction hot spots.

Nucleic Acids Res. 2009 May;37(8):2672-87. doi: 10.1093/nar/gkp132. Epub 2009 Mar 9.

Identifying protein-protein interaction sites in transient complexes with temperature factor, sequence profile and accessible surface area.

Amino Acids. 2010 Jan;38(1):263-70. doi: 10.1007/s00726-009-0245-8. Epub 2009 Feb 12.

Sequence-based prediction of protein interaction sites with an integrative method.

Bioinformatics. 2009 Mar 1;25(5):585-91. doi: 10.1093/bioinformatics/btp039. Epub 2009 Jan 19.

Prediction of protein-protein binding site by using core interface residue and support vector machine.

BMC Bioinformatics. 2008 Dec 22;9:553. doi: 10.1186/1471-2105-9-553.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

严格评估和整合基于序列和结构的特征，以预测热点。

Rigorous assessment and integration of the sequence and structure based features to predict hot spots.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献