• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于混合特征选择策略的蛋白质-蛋白质界面热点预测。

Protein-protein interface hot spots prediction based on a hybrid feature selection strategy.

机构信息

School of Life Sciences, Anhui University, Hefei, Anhui, 230601, China.

State Key Laboratory of Microbial Metabolism, Shanghai JiaoTong University, Shanghai, 200240, China.

出版信息

BMC Bioinformatics. 2018 Jan 15;19(1):14. doi: 10.1186/s12859-018-2009-5.

DOI:10.1186/s12859-018-2009-5
PMID:29334889
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5769548/
Abstract

BACKGROUND

Hot spots are interface residues that contribute most binding affinity to protein-protein interaction. A compact and relevant feature subset is important for building machine learning methods to predict hot spots on protein-protein interfaces. Although different methods have been used to detect the relevant feature subset from a variety of features related to interface residues, it is still a challenge to detect the optimal feature subset for building the final model.

RESULTS

In this study, three different feature selection methods were compared to propose a new hybrid feature selection strategy. This new strategy was proved to effectively reduce the feature space when we were building the prediction models for identifying hotspot residues. It was tested on eighty-two features, both conventional and newly proposed. According to the strategy, combining the feature subsets selected by decision tree and mRMR (maximum Relevance Minimum Redundancy) individually, we were able to build a model with 6 features by using a PSFS (Pseudo Sequential Forward Selection) process. Compared with other state-of-art methods for the independent test set, our model had shown better or comparable predictive performances (with F-measure 0.622 and recall 0.821). Analysis of the 6 features confirmed that our newly proposed feature CNSV_REL1 was important for our model. The analysis also showed that the complementarity between features should be considered as an important aspect when conducting the feature selection.

CONCLUSION

In this study, most important of all, a new strategy for feature selection was proposed and proved to be effective in selecting the optimal feature subset for building prediction models, which can be used to predict hot spot residues on protein-protein interfaces. Moreover, two aspects, the generalization of the single feature and the complementarity between features, were proved to be of great importance and should be considered in feature selection methods. Finally, our newly proposed feature CNSV_REL1 had been proved an alternative and effective feature in predicting hot spots by our study. Our model is available for users through a webserver: http://zhulab.ahu.edu.cn/iPPHOT/ .

摘要

背景

热点是对蛋白质-蛋白质相互作用贡献最大结合亲和力的界面残基。对于构建机器学习方法来预测蛋白质-蛋白质界面上的热点,紧凑且相关的特征子集很重要。尽管已经使用不同的方法从与界面残基相关的各种特征中检测到相关特征子集,但对于检测用于构建最终模型的最佳特征子集仍然是一个挑战。

结果

在这项研究中,比较了三种不同的特征选择方法,提出了一种新的混合特征选择策略。当我们构建用于识别热点残基的预测模型时,该新策略被证明可以有效地减少特征空间。它在 82 个特征上进行了测试,包括传统特征和新提出的特征。根据该策略,分别通过决策树和 mRMR(最大相关性最小冗余)选择特征子集,我们可以通过使用 PSFS(伪序贯前向选择)过程构建具有 6 个特征的模型。与其他独立测试集的最新方法相比,我们的模型表现出更好或相当的预测性能(F 度量为 0.622,召回率为 0.821)。对 6 个特征的分析证实,我们新提出的特征 CNSV_REL1 对我们的模型很重要。分析还表明,在进行特征选择时,特征之间的互补性应该被视为一个重要方面。

结论

在这项研究中,最重要的是提出了一种新的特征选择策略,并证明其在选择构建预测模型的最佳特征子集方面是有效的,可用于预测蛋白质-蛋白质界面上的热点残基。此外,单个特征的泛化和特征之间的互补性两个方面都被证明非常重要,在特征选择方法中应该加以考虑。最后,我们的研究证明,我们新提出的特征 CNSV_REL1 是预测热点的一种替代和有效特征。我们的模型可以通过一个网络服务器供用户使用:http://zhulab.ahu.edu.cn/iPPHOT/ 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6a0/5769548/44779728a99d/12859_2018_2009_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6a0/5769548/bfd4d53e7641/12859_2018_2009_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6a0/5769548/da9789d4eefe/12859_2018_2009_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6a0/5769548/1b6cb8187992/12859_2018_2009_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6a0/5769548/cac1a9f50bc1/12859_2018_2009_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6a0/5769548/b582a0932944/12859_2018_2009_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6a0/5769548/fe0615e9964f/12859_2018_2009_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6a0/5769548/15a3a61562f8/12859_2018_2009_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6a0/5769548/44779728a99d/12859_2018_2009_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6a0/5769548/bfd4d53e7641/12859_2018_2009_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6a0/5769548/da9789d4eefe/12859_2018_2009_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6a0/5769548/1b6cb8187992/12859_2018_2009_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6a0/5769548/cac1a9f50bc1/12859_2018_2009_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6a0/5769548/b582a0932944/12859_2018_2009_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6a0/5769548/fe0615e9964f/12859_2018_2009_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6a0/5769548/15a3a61562f8/12859_2018_2009_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6a0/5769548/44779728a99d/12859_2018_2009_Fig8_HTML.jpg

相似文献

1
Protein-protein interface hot spots prediction based on a hybrid feature selection strategy.基于混合特征选择策略的蛋白质-蛋白质界面热点预测。
BMC Bioinformatics. 2018 Jan 15;19(1):14. doi: 10.1186/s12859-018-2009-5.
2
APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility.APIS:通过结合突出指数和溶剂可及性来准确预测蛋白质界面热点。
BMC Bioinformatics. 2010 Apr 8;11:174. doi: 10.1186/1471-2105-11-174.
3
iPNHOT: a knowledge-based approach for identifying protein-nucleic acid interaction hot spots.
BMC Bioinformatics. 2020 Jul 6;21(1):289. doi: 10.1186/s12859-020-03636-w.
4
An improved DNA-binding hot spot residues prediction method by exploring interfacial neighbor properties.一种通过探索界面邻居性质来改进 DNA 结合热点残基预测方法。
BMC Bioinformatics. 2021 May 17;22(Suppl 3):253. doi: 10.1186/s12859-020-03871-1.
5
A feature-based approach to predict hot spots in protein-DNA binding interfaces.基于特征的方法预测蛋白质-DNA 结合界面热点。
Brief Bioinform. 2020 May 21;21(3):1038-1046. doi: 10.1093/bib/bbz037.
6
Enhanced Prediction of Hot Spots at Protein-Protein Interfaces Using Extreme Gradient Boosting.利用极端梯度提升增强蛋白质-蛋白质界面热点预测。
Sci Rep. 2018 Sep 24;8(1):14285. doi: 10.1038/s41598-018-32511-1.
7
Predicting hot spots in protein interfaces based on protrusion index, pseudo hydrophobicity and electron-ion interaction pseudopotential features.基于突出指数、伪疏水性和电子-离子相互作用赝势特征预测蛋白质界面中的热点。
Oncotarget. 2016 Apr 5;7(14):18065-75. doi: 10.18632/oncotarget.7695.
8
Computationally identifying hot spots in protein-DNA binding interfaces using an ensemble approach.使用集成方法计算蛋白质-DNA 结合界面中的热点。
BMC Bioinformatics. 2020 Sep 17;21(Suppl 13):384. doi: 10.1186/s12859-020-03675-3.
9
A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.一种用于改进蛋白质结构类预测的特征与算法选择方法
Comb Chem High Throughput Screen. 2017;20(7):612-621. doi: 10.2174/1386207320666170314103147.
10
Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences.通过氨基酸序列的理化特性准确预测热点残基。
Proteins. 2013 Aug;81(8):1351-62. doi: 10.1002/prot.24278.

引用本文的文献

1
PPI-hotspot for detecting protein-protein interaction hot spots from the free protein structure.用于从游离蛋白质结构中检测蛋白质-蛋白质相互作用热点的PPI热点。
Elife. 2024 Sep 16;13:RP96643. doi: 10.7554/eLife.96643.
2
PCSK9 inhibitor effectively alleviated cognitive dysfunction in a type 2 diabetes mellitus rat model.PCSK9 抑制剂有效缓解 2 型糖尿病大鼠模型的认知功能障碍。
PeerJ. 2024 Aug 14;12:e17676. doi: 10.7717/peerj.17676. eCollection 2024.
3
Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network.

本文引用的文献

1
The RCSB protein data bank: integrative view of protein, gene and 3D structural information.RCSB蛋白质数据库:蛋白质、基因与三维结构信息的综合视图。
Nucleic Acids Res. 2017 Jan 4;45(D1):D271-D281. doi: 10.1093/nar/gkw1000. Epub 2016 Oct 27.
2
ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules.ConSurf 2016:一种用于估计和可视化大分子进化保守性的改进方法。
Nucleic Acids Res. 2016 Jul 8;44(W1):W344-50. doi: 10.1093/nar/gkw408. Epub 2016 May 10.
3
Solvent accessible surface area-based hot-spot detection methods for protein-protein and protein-nucleic acid interfaces.
通过一维卷积神经网络的蛋白质序列嵌入预测药物发现的热点。
PLoS One. 2023 Sep 18;18(9):e0290899. doi: 10.1371/journal.pone.0290899. eCollection 2023.
4
Interface engineering of cellobiose dehydrogenase improves interdomain electron transfer.纤维素二糖脱氢酶的界面工程提高了结构域间电子转移。
Protein Sci. 2023 Aug;32(8):e4702. doi: 10.1002/pro.4702.
5
A GHKNN model based on the physicochemical property extraction method to identify SNARE proteins.一种基于物理化学性质提取方法的GHKNN模型,用于识别SNARE蛋白。
Front Genet. 2022 Nov 23;13:935717. doi: 10.3389/fgene.2022.935717. eCollection 2022.
6
Insights into Rational Design of a New Class of Allosteric Effectors with Molecular Dynamics Markov State Models and Network Theory.利用分子动力学马尔可夫状态模型和网络理论深入了解新型变构效应剂的合理设计
ACS Omega. 2022 Jan 13;7(3):2831-2841. doi: 10.1021/acsomega.1c05624. eCollection 2022 Jan 25.
7
Data analysis methods for defining biomarkers from omics data.用于从组学数据中定义生物标志物的数据分析方法。
Anal Bioanal Chem. 2022 Jan;414(1):235-250. doi: 10.1007/s00216-021-03813-7. Epub 2021 Dec 24.
8
A Coarse-Grained Methodology Identifies Intrinsic Mechanisms That Dissociate Interacting Protein Pairs.一种粗粒度方法确定了解离相互作用蛋白对的内在机制。
Front Mol Biosci. 2020 Aug 25;7:210. doi: 10.3389/fmolb.2020.00210. eCollection 2020.
9
m5CPred-SVM: a novel method for predicting m5C sites of RNA.m5CPred-SVM:一种预测 RNA m5C 位点的新方法。
BMC Bioinformatics. 2020 Oct 30;21(1):489. doi: 10.1186/s12859-020-03828-4.
10
SPOTONE: Hot Spots on Protein Complexes with Extremely Randomized Trees via Sequence-Only Features.SPOTONE:基于序列特征的极度随机化树的蛋白质复合物热点。
Int J Mol Sci. 2020 Oct 1;21(19):7281. doi: 10.3390/ijms21197281.
基于溶剂可及表面积的蛋白质-蛋白质和蛋白质-核酸界面热点检测方法。
J Chem Inf Model. 2015 May 26;55(5):1077-86. doi: 10.1021/ci500760m. Epub 2015 Apr 17.
4
KFC2: a knowledge-based hot spot prediction method based on interface solvation, atomic density, and plasticity features.KFC2:一种基于界面溶剂化、原子密度和塑性特征的知识型热点预测方法。
Proteins. 2011 Sep;79(9):2671-83. doi: 10.1002/prot.23094. Epub 2011 Jul 6.
5
HotPoint: hot spot prediction server for protein interfaces.热点:用于蛋白质界面的热点预测服务器。
Nucleic Acids Res. 2010 Jul;38(Web Server issue):W402-6. doi: 10.1093/nar/gkq323. Epub 2010 May 5.
6
APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility.APIS:通过结合突出指数和溶剂可及性来准确预测蛋白质界面热点。
BMC Bioinformatics. 2010 Apr 8;11:174. doi: 10.1186/1471-2105-11-174.
7
Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods.通过机器学习和基于能量的方法相结合来预测蛋白质-蛋白质界面的热点残基。
BMC Bioinformatics. 2009 Oct 30;10:365. doi: 10.1186/1471-2105-10-365.
8
Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy.蛋白质界面中计算热点的识别:结合溶剂可及性和残基间势能可提高准确性。
Bioinformatics. 2009 Jun 15;25(12):1513-20. doi: 10.1093/bioinformatics/btp240. Epub 2009 Apr 8.
9
A feature-based approach to modeling protein-protein interaction hot spots.一种基于特征的蛋白质-蛋白质相互作用热点建模方法。
Nucleic Acids Res. 2009 May;37(8):2672-87. doi: 10.1093/nar/gkp132. Epub 2009 Mar 9.
10
Sequence-based prediction of protein interaction sites with an integrative method.基于序列的蛋白质相互作用位点的综合预测方法。
Bioinformatics. 2009 Mar 1;25(5):585-91. doi: 10.1093/bioinformatics/btp039. Epub 2009 Jan 19.