Suppr超能文献

基于混合特征选择策略的蛋白质-蛋白质界面热点预测。

Protein-protein interface hot spots prediction based on a hybrid feature selection strategy.

机构信息

School of Life Sciences, Anhui University, Hefei, Anhui, 230601, China.

State Key Laboratory of Microbial Metabolism, Shanghai JiaoTong University, Shanghai, 200240, China.

出版信息

BMC Bioinformatics. 2018 Jan 15;19(1):14. doi: 10.1186/s12859-018-2009-5.

Abstract

BACKGROUND

Hot spots are interface residues that contribute most binding affinity to protein-protein interaction. A compact and relevant feature subset is important for building machine learning methods to predict hot spots on protein-protein interfaces. Although different methods have been used to detect the relevant feature subset from a variety of features related to interface residues, it is still a challenge to detect the optimal feature subset for building the final model.

RESULTS

In this study, three different feature selection methods were compared to propose a new hybrid feature selection strategy. This new strategy was proved to effectively reduce the feature space when we were building the prediction models for identifying hotspot residues. It was tested on eighty-two features, both conventional and newly proposed. According to the strategy, combining the feature subsets selected by decision tree and mRMR (maximum Relevance Minimum Redundancy) individually, we were able to build a model with 6 features by using a PSFS (Pseudo Sequential Forward Selection) process. Compared with other state-of-art methods for the independent test set, our model had shown better or comparable predictive performances (with F-measure 0.622 and recall 0.821). Analysis of the 6 features confirmed that our newly proposed feature CNSV_REL1 was important for our model. The analysis also showed that the complementarity between features should be considered as an important aspect when conducting the feature selection.

CONCLUSION

In this study, most important of all, a new strategy for feature selection was proposed and proved to be effective in selecting the optimal feature subset for building prediction models, which can be used to predict hot spot residues on protein-protein interfaces. Moreover, two aspects, the generalization of the single feature and the complementarity between features, were proved to be of great importance and should be considered in feature selection methods. Finally, our newly proposed feature CNSV_REL1 had been proved an alternative and effective feature in predicting hot spots by our study. Our model is available for users through a webserver: http://zhulab.ahu.edu.cn/iPPHOT/ .

摘要

背景

热点是对蛋白质-蛋白质相互作用贡献最大结合亲和力的界面残基。对于构建机器学习方法来预测蛋白质-蛋白质界面上的热点,紧凑且相关的特征子集很重要。尽管已经使用不同的方法从与界面残基相关的各种特征中检测到相关特征子集,但对于检测用于构建最终模型的最佳特征子集仍然是一个挑战。

结果

在这项研究中,比较了三种不同的特征选择方法,提出了一种新的混合特征选择策略。当我们构建用于识别热点残基的预测模型时,该新策略被证明可以有效地减少特征空间。它在 82 个特征上进行了测试,包括传统特征和新提出的特征。根据该策略,分别通过决策树和 mRMR(最大相关性最小冗余)选择特征子集,我们可以通过使用 PSFS(伪序贯前向选择)过程构建具有 6 个特征的模型。与其他独立测试集的最新方法相比,我们的模型表现出更好或相当的预测性能(F 度量为 0.622,召回率为 0.821)。对 6 个特征的分析证实,我们新提出的特征 CNSV_REL1 对我们的模型很重要。分析还表明,在进行特征选择时,特征之间的互补性应该被视为一个重要方面。

结论

在这项研究中,最重要的是提出了一种新的特征选择策略,并证明其在选择构建预测模型的最佳特征子集方面是有效的,可用于预测蛋白质-蛋白质界面上的热点残基。此外,单个特征的泛化和特征之间的互补性两个方面都被证明非常重要,在特征选择方法中应该加以考虑。最后,我们的研究证明,我们新提出的特征 CNSV_REL1 是预测热点的一种替代和有效特征。我们的模型可以通过一个网络服务器供用户使用:http://zhulab.ahu.edu.cn/iPPHOT/

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6a0/5769548/bfd4d53e7641/12859_2018_2009_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验