• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用XGBoost特征选择和堆叠集成分类器提高蛋白质-蛋白质相互作用预测准确性。

Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier.

作者信息

Chen Cheng, Zhang Qingmei, Yu Bin, Yu Zhaomin, Lawrence Patrick J, Ma Qin, Zhang Yan

机构信息

College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China.

College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China; School of Life Sciences, University of Science and Technology of China, Hefei, 230027, China.

出版信息

Comput Biol Med. 2020 Aug;123:103899. doi: 10.1016/j.compbiomed.2020.103899. Epub 2020 Jul 15.

DOI:10.1016/j.compbiomed.2020.103899
PMID:32768046
Abstract

Protein-protein interactions (PPIs) are involved with most cellular activities at the proteomic level, making the study of PPIs necessary to comprehending any biological process. Machine learning approaches have been explored, leading to more accurate and generalized PPIs predictions. In this paper, we propose a predictive framework called StackPPI. First, we use pseudo amino acid composition, Moreau-Broto, Moran and Geary autocorrelation descriptor, amino acid composition position-specific scoring matrix, Bi-gram position-specific scoring matrix and composition, transition and distribution to encode biologically relevant features. Secondly, we employ XGBoost to reduce feature noise and perform dimensionality reduction through gradient boosting and average gain. Finally, the optimized features that result are analyzed by StackPPI, a PPIs predictor we have developed from a stacked ensemble classifier consisting of random forest, extremely randomized trees and logistic regression algorithms. Five-fold cross-validation shows StackPPI can successfully predict PPIs with an ACC of 89.27%, MCC of 0.7859, AUC of 0.9561 on Helicobacter pylori, and with an ACC of 94.64%, MCC of 0.8934, AUC of 0.9810 on Saccharomyces cerevisiae. We find StackPPI improves protein interaction prediction accuracy on independent test sets compared to the state-of-the-art models. Finally, we highlight StackPPI's ability to infer biologically significant PPI networks. StackPPI's accurate prediction of functional pathways make it the logical choice for studying the underlying mechanism of PPIs, especially as it applies to drug design. The datasets and source code used to create StackPPI are available here: https://github.com/QUST-AIBBDRC/StackPPI/.

摘要

蛋白质-蛋白质相互作用(PPIs)在蛋白质组学水平上参与了大多数细胞活动,因此研究PPIs对于理解任何生物过程都是必要的。人们已经探索了机器学习方法,从而实现了更准确和通用的PPIs预测。在本文中,我们提出了一个名为StackPPI的预测框架。首先,我们使用伪氨基酸组成、莫罗-布罗托、莫兰和吉尔里自相关描述符、氨基酸组成位置特异性评分矩阵、双字位置特异性评分矩阵以及组成、转换和分布来编码生物学相关特征。其次,我们使用XGBoost来减少特征噪声,并通过梯度提升和平均增益进行降维。最后,由随机森林、极端随机树和逻辑回归算法组成的堆叠集成分类器开发的PPIs预测器StackPPI对得到的优化特征进行分析。五折交叉验证表明,StackPPI能够成功预测幽门螺杆菌的PPIs,其ACC为89.27%,MCC为0.7859,AUC为0.9561;对酿酒酵母的ACC为94.64%,MCC为0.8934,AUC为0.9810。我们发现,与现有最先进的模型相比,StackPPI在独立测试集上提高了蛋白质相互作用预测的准确性。最后,我们强调了StackPPI推断具有生物学意义的PPI网络的能力。StackPPI对功能途径的准确预测使其成为研究PPIs潜在机制的合理选择,特别是在药物设计方面。用于创建StackPPI的数据集和源代码可在此处获取:https://github.com/QUST-AIBBDRC/StackPPI/ 。

相似文献

1
Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier.使用XGBoost特征选择和堆叠集成分类器提高蛋白质-蛋白质相互作用预测准确性。
Comput Biol Med. 2020 Aug;123:103899. doi: 10.1016/j.compbiomed.2020.103899. Epub 2020 Jul 15.
2
GTB-PPI: Predict Protein-protein Interactions Based on L1-regularized Logistic Regression and Gradient Tree Boosting.GTB-PPI:基于 L1 正则化逻辑回归和梯度提升树的蛋白质相互作用预测。
Genomics Proteomics Bioinformatics. 2020 Oct;18(5):582-592. doi: 10.1016/j.gpb.2021.01.001. Epub 2021 Jan 27.
3
Predicting protein-protein interactions by fusing various Chou's pseudo components and using wavelet denoising approach.通过融合各种周伪氨基酸组成成分并使用小波去噪方法来预测蛋白质-蛋白质相互作用。
J Theor Biol. 2019 Feb 7;462:329-346. doi: 10.1016/j.jtbi.2018.11.011. Epub 2018 Nov 16.
4
Improving protein-protein interactions prediction accuracy using protein evolutionary information and relevance vector machine model.利用蛋白质进化信息和相关向量机模型提高蛋白质-蛋白质相互作用预测准确性
Protein Sci. 2016 Oct;25(10):1825-33. doi: 10.1002/pro.2991. Epub 2016 Aug 9.
5
DeepStack-DTIs: Predicting Drug-Target Interactions Using LightGBM Feature Selection and Deep-Stacked Ensemble Classifier.DeepStack-DTIs:使用 LightGBM 特征选择和深度堆叠集成分类器预测药物-靶标相互作用。
Interdiscip Sci. 2022 Jun;14(2):311-330. doi: 10.1007/s12539-021-00488-7. Epub 2021 Nov 3.
6
Predicting Protein-Protein Interactions via Random Ferns with Evolutionary Matrix Representation.基于进化矩阵表示的随机蕨类预测蛋白质-蛋白质相互作用。
Comput Math Methods Med. 2022 Feb 22;2022:7191684. doi: 10.1155/2022/7191684. eCollection 2022.
7
Advancing the prediction accuracy of protein-protein interactions by utilizing evolutionary information from position-specific scoring matrix and ensemble classifier.利用来自位置特异性得分矩阵的进化信息和集成分类器提高蛋白质-蛋白质相互作用的预测准确性。
J Theor Biol. 2017 Apr 7;418:105-110. doi: 10.1016/j.jtbi.2017.01.003. Epub 2017 Jan 11.
8
SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting.SubMito-XGBoost:通过融合多种特征信息和极端梯度提升预测蛋白质亚线粒体定位。
Bioinformatics. 2020 Feb 15;36(4):1074-1081. doi: 10.1093/bioinformatics/btz734.
9
Prediction of protein ubiquitination sites via multi-view features based on eXtreme gradient boosting classifier.基于极端梯度提升分类器的多视图特征预测蛋白质泛素化位点。
J Mol Graph Model. 2021 Sep;107:107962. doi: 10.1016/j.jmgm.2021.107962. Epub 2021 Jun 15.
10
Global protein-protein interaction networks in yeast saccharomyces cerevisiae and helicobacter pylori.酵母酿酒酵母和幽门螺杆菌的全球蛋白质-蛋白质相互作用网络。
Talanta. 2023 Dec 1;265:124836. doi: 10.1016/j.talanta.2023.124836. Epub 2023 Jun 20.

引用本文的文献

1
Automated drug design for druggable target identification using integrated stacked autoencoder and hierarchically self-adaptive optimization.使用集成堆叠自动编码器和分层自适应优化进行可成药靶点识别的自动化药物设计
Sci Rep. 2025 Sep 1;15(1):32205. doi: 10.1038/s41598-025-18091-x.
2
Predicting ICU Mortality Among Septic Patients Using Machine Learning Technique.使用机器学习技术预测脓毒症患者的重症监护病房死亡率。
J Clin Med. 2025 May 16;14(10):3495. doi: 10.3390/jcm14103495.
3
Prediction and validation of nanowire proteins in G20 using machine learning and feature engineering.
使用机器学习和特征工程对G20中的纳米线蛋白进行预测与验证。
Comput Struct Biotechnol J. 2025 Apr 19;27:1706-1718. doi: 10.1016/j.csbj.2025.04.022. eCollection 2025.
4
Negative sampling strategies impact the prediction of scale-free biomolecular network interactions with machine learning.负采样策略会影响利用机器学习对无标度生物分子网络相互作用的预测。
BMC Biol. 2025 May 9;23(1):123. doi: 10.1186/s12915-025-02231-w.
5
Proactive detection of anomalous behavior in Ethereum accounts using XAI-enabled ensemble stacking with Bayesian optimization.使用具有贝叶斯优化的启用XAI的集成堆叠来主动检测以太坊账户中的异常行为。
PeerJ Comput Sci. 2025 Mar 19;11:e2630. doi: 10.7717/peerj-cs.2630. eCollection 2025.
6
Protein-protein interaction prediction using enhanced features with spaced conjoint triad and amino acid pairwise distance.利用具有间隔联合三联体和氨基酸成对距离的增强特征进行蛋白质-蛋白质相互作用预测。
PeerJ Comput Sci. 2025 Mar 19;11:e2748. doi: 10.7717/peerj-cs.2748. eCollection 2025.
7
Prediction of protein interactions with function in protein (de-)phosphorylation.蛋白质(去)磷酸化过程中具有功能的蛋白质相互作用预测。
PLoS One. 2025 Mar 3;20(3):e0319084. doi: 10.1371/journal.pone.0319084. eCollection 2025.
8
A Seasonal Fresh Tea Yield Estimation Method with Machine Learning Algorithms at Field Scale Integrating UAV RGB and Sentinel-2 Imagery.一种基于无人机RGB影像和哨兵-2影像的田间尺度机器学习算法季节性鲜叶产量估算方法。
Plants (Basel). 2025 Jan 26;14(3):373. doi: 10.3390/plants14030373.
9
WDRIV-Net: a weighted ensemble transfer learning to improve automatic type stratification of lumbar intervertebral disc bulge, prolapse, and herniation.WDRIV-Net:一种加权集成迁移学习方法,用于改善腰椎间盘膨出、脱垂和突出的自动类型分层
Biomed Eng Online. 2025 Feb 6;24(1):11. doi: 10.1186/s12938-025-01341-4.
10
Prediction of hemolytic peptides and their hemolytic concentration.溶血肽及其溶血浓度的预测。
Commun Biol. 2025 Feb 4;8(1):176. doi: 10.1038/s42003-025-07615-w.