Suppr超能文献

使用XGBoost特征选择和堆叠集成分类器提高蛋白质-蛋白质相互作用预测准确性。

Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier.

作者信息

Chen Cheng, Zhang Qingmei, Yu Bin, Yu Zhaomin, Lawrence Patrick J, Ma Qin, Zhang Yan

机构信息

College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China.

College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China; School of Life Sciences, University of Science and Technology of China, Hefei, 230027, China.

出版信息

Comput Biol Med. 2020 Aug;123:103899. doi: 10.1016/j.compbiomed.2020.103899. Epub 2020 Jul 15.

Abstract

Protein-protein interactions (PPIs) are involved with most cellular activities at the proteomic level, making the study of PPIs necessary to comprehending any biological process. Machine learning approaches have been explored, leading to more accurate and generalized PPIs predictions. In this paper, we propose a predictive framework called StackPPI. First, we use pseudo amino acid composition, Moreau-Broto, Moran and Geary autocorrelation descriptor, amino acid composition position-specific scoring matrix, Bi-gram position-specific scoring matrix and composition, transition and distribution to encode biologically relevant features. Secondly, we employ XGBoost to reduce feature noise and perform dimensionality reduction through gradient boosting and average gain. Finally, the optimized features that result are analyzed by StackPPI, a PPIs predictor we have developed from a stacked ensemble classifier consisting of random forest, extremely randomized trees and logistic regression algorithms. Five-fold cross-validation shows StackPPI can successfully predict PPIs with an ACC of 89.27%, MCC of 0.7859, AUC of 0.9561 on Helicobacter pylori, and with an ACC of 94.64%, MCC of 0.8934, AUC of 0.9810 on Saccharomyces cerevisiae. We find StackPPI improves protein interaction prediction accuracy on independent test sets compared to the state-of-the-art models. Finally, we highlight StackPPI's ability to infer biologically significant PPI networks. StackPPI's accurate prediction of functional pathways make it the logical choice for studying the underlying mechanism of PPIs, especially as it applies to drug design. The datasets and source code used to create StackPPI are available here: https://github.com/QUST-AIBBDRC/StackPPI/.

摘要

蛋白质-蛋白质相互作用(PPIs)在蛋白质组学水平上参与了大多数细胞活动,因此研究PPIs对于理解任何生物过程都是必要的。人们已经探索了机器学习方法,从而实现了更准确和通用的PPIs预测。在本文中,我们提出了一个名为StackPPI的预测框架。首先,我们使用伪氨基酸组成、莫罗-布罗托、莫兰和吉尔里自相关描述符、氨基酸组成位置特异性评分矩阵、双字位置特异性评分矩阵以及组成、转换和分布来编码生物学相关特征。其次,我们使用XGBoost来减少特征噪声,并通过梯度提升和平均增益进行降维。最后,由随机森林、极端随机树和逻辑回归算法组成的堆叠集成分类器开发的PPIs预测器StackPPI对得到的优化特征进行分析。五折交叉验证表明,StackPPI能够成功预测幽门螺杆菌的PPIs,其ACC为89.27%,MCC为0.7859,AUC为0.9561;对酿酒酵母的ACC为94.64%,MCC为0.8934,AUC为0.9810。我们发现,与现有最先进的模型相比,StackPPI在独立测试集上提高了蛋白质相互作用预测的准确性。最后,我们强调了StackPPI推断具有生物学意义的PPI网络的能力。StackPPI对功能途径的准确预测使其成为研究PPIs潜在机制的合理选择,特别是在药物设计方面。用于创建StackPPI的数据集和源代码可在此处获取:https://github.com/QUST-AIBBDRC/StackPPI/

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验