基于序列的界面残基识别方法，整合了疏水作用和进化信息的综合轮廓。

Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information.

机构信息

Bioinformatics Research Center, School of Computer Engineering, Nanyang Technological University, 639798 Singapore.

出版信息

BMC Bioinformatics. 2010 Jul 28;11:402. doi: 10.1186/1471-2105-11-402.

DOI:10.1186/1471-2105-11-402

PMID:20667087

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2921408/

Abstract

BACKGROUND

Protein-protein interactions play essential roles in protein function determination and drug design. Numerous methods have been proposed to recognize their interaction sites, however, only a small proportion of protein complexes have been successfully resolved due to the high cost. Therefore, it is important to improve the performance for predicting protein interaction sites based on primary sequence alone.

RESULTS

We propose a new idea to construct an integrative profile for each residue in a protein by combining its hydrophobic and evolutionary information. A support vector machine (SVM) ensemble is then developed, where SVMs train on different pairs of positive (interface sites) and negative (non-interface sites) subsets. The subsets having roughly the same sizes are grouped in the order of accessible surface area change before and after complexation. A self-organizing map (SOM) technique is applied to group similar input vectors to make more accurate the identification of interface residues. An ensemble of ten-SVMs achieves an MCC improvement by around 8% and F1 improvement by around 9% over that of three-SVMs. As expected, SVM ensembles constantly perform better than individual SVMs. In addition, the model by the integrative profiles outperforms that based on the sequence profile or the hydropathy scale alone. As our method uses a small number of features to encode the input vectors, our model is simpler, faster and more accurate than the existing methods.

CONCLUSIONS

The integrative profile by combining hydrophobic and evolutionary information contributes most to the protein-protein interaction prediction. Results show that evolutionary context of residue with respect to hydrophobicity makes better the identification of protein interface residues. In addition, the ensemble of SVM classifiers improves the prediction performance.

AVAILABILITY

Datasets and software are available at http://mail.ustc.edu.cn/~bigeagle/BMCBioinfo2010/index.htm.

摘要

背景

蛋白质-蛋白质相互作用在蛋白质功能确定和药物设计中起着至关重要的作用。已经提出了许多方法来识别它们的相互作用位点，但是由于成本高，只有一小部分蛋白质复合物被成功解析。因此，提高仅基于序列预测蛋白质相互作用位点的性能非常重要。

结果

我们提出了一种新的想法，通过结合蛋白质中每个残基的疏水和亲水信息来构建整合图。然后开发了一个支持向量机（SVM）集成，其中 SVM 分别在不同的正负（界面位点）和负（非界面位点）子集上进行训练。在复杂之前和之后，根据可及表面积变化，将具有大致相同大小的子集按顺序分组。然后应用自组织映射（SOM）技术将相似的输入向量分组，以更准确地识别界面残基。十个 SVM 的集成在 MCC 上提高了约 8%，在 F1 上提高了约 9%，而三个 SVM 的提高了约 8%。正如预期的那样，SVM 集成始终比单个 SVM 表现更好。此外，基于整合图的模型优于基于序列图或疏水力图的模型。由于我们的方法使用少量特征来编码输入向量，因此我们的模型比现有方法更简单、更快、更准确。

结论

结合疏水和亲水信息的整合图对蛋白质-蛋白质相互作用预测贡献最大。结果表明，残基的疏水和亲水进化背景使蛋白质界面残基的识别效果更好。此外，SVM 分类器的集成提高了预测性能。

可用性

数据集和软件可在 http://mail.ustc.edu.cn/~bigeagle/BMCBioinfo2010/index.htm 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7882/2921408/0fd7b4f72d25/1471-2105-11-402-1.jpg

相似文献

Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information.

BMC Bioinformatics. 2010 Jul 28;11:402. doi: 10.1186/1471-2105-11-402.

APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility.

BMC Bioinformatics. 2010 Apr 8;11:174. doi: 10.1186/1471-2105-11-174.

HemeBIND: a novel method for heme binding residue prediction by combining structural and sequence information.

BMC Bioinformatics. 2011 May 26;12:207. doi: 10.1186/1471-2105-12-207.

SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.

BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.

EL_PSSM-RT: DNA-binding residue prediction by integrating ensemble learning with PSSM Relation Transformation.

BMC Bioinformatics. 2017 Aug 29;18(1):379. doi: 10.1186/s12859-017-1792-8.

Inferring protein-protein interacting sites using residue conservation and evolutionary information.

Protein Pept Lett. 2006;13(10):999-1005. doi: 10.2174/092986606778777498.

Prediction of protein-protein binding site by using core interface residue and support vector machine.

BMC Bioinformatics. 2008 Dec 22;9:553. doi: 10.1186/1471-2105-9-553.

Hot spot prediction in protein-protein interactions by an ensemble system.

BMC Syst Biol. 2018 Dec 31;12(Suppl 9):132. doi: 10.1186/s12918-018-0665-8.

BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features.

BMC Syst Biol. 2010 May 28;4 Suppl 1(Suppl 1):S3. doi: 10.1186/1752-0509-4-S1-S3.

Improving protein-protein interaction prediction using evolutionary information from low-quality MSAs.

PLoS One. 2017 Feb 6;12(2):e0169356. doi: 10.1371/journal.pone.0169356. eCollection 2017.

引用本文的文献

Unravelling the human taste receptor interactome: machine learning and molecular modelling insights into protein-protein interactions.

NPJ Sci Food. 2025 Jul 1;9(1):113. doi: 10.1038/s41538-025-00478-9.

DeepBSRPred: deep learning-based binding site residue prediction for proteins.

Amino Acids. 2023 Oct;55(10):1305-1316. doi: 10.1007/s00726-022-03228-3. Epub 2022 Dec 27.

Prediction of Protein-Protein Interaction Sites Using Convolutional Neural Network and Improved Data Sets.

Int J Mol Sci. 2020 Jan 11;21(2):467. doi: 10.3390/ijms21020467.

Predicting drug-target interactions from drug structure and protein sequence using novel convolutional neural networks.

BMC Bioinformatics. 2019 Dec 24;20(Suppl 25):689. doi: 10.1186/s12859-019-3263-x.

SCRIBER: accurate and partner type-specific prediction of protein-binding residues from proteins sequences.

Bioinformatics. 2019 Jul 15;35(14):i343-i353. doi: 10.1093/bioinformatics/btz324.

Special Protein Molecules Computational Identification.

Int J Mol Sci. 2018 Feb 10;19(2):536. doi: 10.3390/ijms19020536.

DrugECs: An Ensemble System with Feature Subspaces for Accurate Drug-Target Interaction Prediction.

Biomed Res Int. 2017;2017:6340316. doi: 10.1155/2017/6340316. Epub 2017 Jul 4.

Prediction of Protein Hotspots from Whole Protein Sequences by a Random Projection Ensemble System.

Int J Mol Sci. 2017 Jul 18;18(7):1543. doi: 10.3390/ijms18071543.

Progress and challenges in predicting protein interfaces.

Brief Bioinform. 2016 Jan;17(1):117-31. doi: 10.1093/bib/bbv027. Epub 2015 May 13.

LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone.

BMC Bioinformatics. 2014;15 Suppl 15(Suppl 15):S4. doi: 10.1186/1471-2105-15-S15-S4. Epub 2014 Dec 3.

本文引用的文献

Improved prediction of protein binding sites from sequences using genetic algorithm.

Protein J. 2009 Aug;28(6):273-80. doi: 10.1007/s10930-009-9192-1.

Progress and challenges in predicting protein-protein interaction sites.

Brief Bioinform. 2009 May;10(3):233-46. doi: 10.1093/bib/bbp021. Epub 2009 Apr 3.

Prediction of protein-protein interaction sites in sequences and 3D structures by random forests.

PLoS Comput Biol. 2009 Jan;5(1):e1000278. doi: 10.1371/journal.pcbi.1000278. Epub 2009 Jan 30.

Sequence-based prediction of protein interaction sites with an integrative method.

Bioinformatics. 2009 Mar 1;25(5):585-91. doi: 10.1093/bioinformatics/btp039. Epub 2009 Jan 19.

How proteins get in touch: interface prediction in the study of biomolecular complexes.

Curr Protein Pept Sci. 2008 Aug;9(4):394-406. doi: 10.2174/138920308785132712.

PSAIA - protein structure and interaction analyzer.

BMC Struct Biol. 2008 Apr 9;8:21. doi: 10.1186/1472-6807-8-21.

The universal protein resource (UniProt).

Nucleic Acids Res. 2008 Jan;36(Database issue):D190-5. doi: 10.1093/nar/gkm895. Epub 2007 Nov 27.

Exploiting residue-level and profile-level interface propensities for usage in binding sites prediction of proteins.

BMC Bioinformatics. 2007 May 5;8:147. doi: 10.1186/1471-2105-8-147.

Prediction of protein B-factors using multi-class bounded SVM.

Protein Pept Lett. 2007;14(2):185-90. doi: 10.2174/092986607779816078.

ISIS: interaction sites identified from sequence.

Bioinformatics. 2007 Jan 15;23(2):e13-6. doi: 10.1093/bioinformatics/btl303.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于序列的界面残基识别方法，整合了疏水作用和进化信息的综合轮廓。

Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information.

机构信息

Bioinformatics Research Center, School of Computer Engineering, Nanyang Technological University, 639798 Singapore.