• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于 XGBoost 算法开发用于预测蛋白质-蛋白质相互作用位点的计算模型。

Developing Computational Model to Predict Protein-Protein Interaction Sites Based on the XGBoost Algorithm.

机构信息

Key Laboratory of Metallurgical Emission Reduction & Resources Recycling (Anhui University of Technology), Ministry of Education, Ma'anshan 243002, China.

School of Metallurgical Engineering, Anhui University of Technology, Ma'anshan 243032, China.

出版信息

Int J Mol Sci. 2020 Mar 25;21(7):2274. doi: 10.3390/ijms21072274.

DOI:10.3390/ijms21072274
PMID:32218345
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7178137/
Abstract

The study of protein-protein interaction is of great biological significance, and the prediction of protein-protein interaction sites can promote the understanding of cell biological activity and will be helpful for drug development. However, uneven distribution between interaction and non-interaction sites is common because only a small number of protein interactions have been confirmed by experimental techniques, which greatly affects the predictive capability of computational methods. In this work, two imbalanced data processing strategies based on XGBoost algorithm were proposed to re-balance the original dataset from inherent relationship between positive and negative samples for the prediction of protein-protein interaction sites. Herein, a feature extraction method was applied to represent the protein interaction sites based on evolutionary conservatism of proteins, and the influence of overlapping regions of positive and negative samples was considered in prediction performance. Our method showed good prediction performance, such as prediction accuracy of 0.807 and MCC of 0.614, on an original dataset with 10,455 surface residues but only 2297 interface residues. Experimental results demonstrated the effectiveness of our XGBoost-based method.

摘要

蛋白质-蛋白质相互作用的研究具有重要的生物学意义,而预测蛋白质相互作用位点可以促进对细胞生物活性的理解,并有助于药物开发。然而,由于只有少数蛋白质相互作用已经通过实验技术得到证实,因此交互和非交互位点之间的分布不均匀是很常见的,这极大地影响了计算方法的预测能力。在这项工作中,提出了两种基于 XGBoost 算法的不平衡数据处理策略,以便通过正负样本之间的固有关系重新平衡原始数据集,从而对蛋白质相互作用位点进行预测。在这里,应用了一种特征提取方法,基于蛋白质的进化保守性来表示蛋白质相互作用位点,并在预测性能中考虑了正负样本重叠区域的影响。我们的方法在一个原始数据集上表现出了良好的预测性能,例如在一个包含 10455 个表面残基但只有 2297 个界面残基的数据集上,预测准确率为 0.807,MCC 为 0.614。实验结果证明了我们基于 XGBoost 的方法的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1146/7178137/d126b1f138e8/ijms-21-02274-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1146/7178137/ae88fa0f4ebb/ijms-21-02274-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1146/7178137/34d9ebc531a6/ijms-21-02274-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1146/7178137/c314dc58e9fa/ijms-21-02274-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1146/7178137/d28b1e262e22/ijms-21-02274-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1146/7178137/d126b1f138e8/ijms-21-02274-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1146/7178137/ae88fa0f4ebb/ijms-21-02274-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1146/7178137/34d9ebc531a6/ijms-21-02274-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1146/7178137/c314dc58e9fa/ijms-21-02274-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1146/7178137/d28b1e262e22/ijms-21-02274-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1146/7178137/d126b1f138e8/ijms-21-02274-g005.jpg

相似文献

1
Developing Computational Model to Predict Protein-Protein Interaction Sites Based on the XGBoost Algorithm.基于 XGBoost 算法开发用于预测蛋白质-蛋白质相互作用位点的计算模型。
Int J Mol Sci. 2020 Mar 25;21(7):2274. doi: 10.3390/ijms21072274.
2
Imbalance Data Processing Strategy for Protein Interaction Sites Prediction.蛋白质相互作用位点预测的不平衡数据处理策略。
IEEE/ACM Trans Comput Biol Bioinform. 2021 May-Jun;18(3):985-994. doi: 10.1109/TCBB.2019.2953908. Epub 2021 Jun 3.
3
PPIevo: protein-protein interaction prediction from PSSM based evolutionary information.PPIevo:基于 PSSM 的进化信息的蛋白质-蛋白质相互作用预测。
Genomics. 2013 Oct;102(4):237-42. doi: 10.1016/j.ygeno.2013.05.006. Epub 2013 Jun 6.
4
Prediction of drug-target interaction based on protein features using undersampling and feature selection techniques with boosting.基于蛋白质特征的药物-靶标相互作用预测,采用欠采样和特征选择技术与提升相结合。
Anal Biochem. 2020 Jan 15;589:113507. doi: 10.1016/j.ab.2019.113507. Epub 2019 Nov 15.
5
Sequence-based prediction of protein interaction sites with an integrative method.基于序列的蛋白质相互作用位点的综合预测方法。
Bioinformatics. 2009 Mar 1;25(5):585-91. doi: 10.1093/bioinformatics/btp039. Epub 2009 Jan 19.
6
A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.一种用于改进蛋白质结构类预测的特征与算法选择方法
Comb Chem High Throughput Screen. 2017;20(7):612-621. doi: 10.2174/1386207320666170314103147.
7
EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites.EPuL:一种用于预测泛素化位点的增强型正未标记学习算法
Molecules. 2017 Sep 5;22(9):1463. doi: 10.3390/molecules22091463.
8
Prediction of Protein-Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures.基于机器学习的数据清理和后过滤程序预测蛋白质-蛋白质相互作用位点
J Membr Biol. 2016 Apr;249(1-2):141-53. doi: 10.1007/s00232-015-9856-z. Epub 2015 Nov 12.
9
Prediction of protein-protein interaction sites by random forest algorithm with mRMR and IFS.基于 mRMR 和 IFS 的随机森林算法预测蛋白质相互作用位点
PLoS One. 2012;7(8):e43927. doi: 10.1371/journal.pone.0043927. Epub 2012 Aug 28.
10
Protein-Protein Interaction Interface Residue Pair Prediction Based on Deep Learning Architecture.基于深度学习架构的蛋白质-蛋白质相互作用界面残基对预测。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Sep-Oct;16(5):1753-1759. doi: 10.1109/TCBB.2017.2706682. Epub 2017 May 19.

引用本文的文献

1
ASCE-PPIS: a protein-protein interaction sites predictor based on equivariant graph neural network with fusion of structure-aware pooling and graph collapse.ASCE-PPIS:一种基于等变图神经网络的蛋白质-蛋白质相互作用位点预测器,融合了结构感知池化和图折叠。
Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf423.
2
Gated-GPS: enhancing protein-protein interaction site prediction with scalable learning and imbalance-aware optimization.门控全局预测系统(Gated-GPS):通过可扩展学习和不平衡感知优化增强蛋白质-蛋白质相互作用位点预测
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf248.
3
Predicting hospital outpatient volume using XGBoost: a machine learning approach.

本文引用的文献

1
DELPHI: accurate deep ensemble model for protein interaction sites prediction.DELPHI:用于蛋白质相互作用位点预测的准确深度集成模型。
Bioinformatics. 2021 May 17;37(7):896-904. doi: 10.1093/bioinformatics/btaa750.
2
Semi-supervised prediction of protein interaction sites from unlabeled sample information.基于未标记样本信息的蛋白质相互作用位点的半监督预测。
BMC Bioinformatics. 2019 Dec 24;20(Suppl 25):699. doi: 10.1186/s12859-019-3274-7.
3
Imbalance Data Processing Strategy for Protein Interaction Sites Prediction.蛋白质相互作用位点预测的不平衡数据处理策略。
使用XGBoost预测医院门诊量:一种机器学习方法。
Sci Rep. 2025 May 16;15(1):17028. doi: 10.1038/s41598-025-01265-y.
4
A review of machine learning methods for imbalanced data challenges in chemistry.化学中不平衡数据挑战的机器学习方法综述。
Chem Sci. 2025 Apr 22;16(18):7637-7658. doi: 10.1039/d5sc00270b. eCollection 2025 May 7.
5
Revolutionizing oncology: the role of Artificial Intelligence (AI) as an antibody design, and optimization tools.肿瘤学的变革:人工智能(AI)作为抗体设计与优化工具的作用。
Biomark Res. 2025 Mar 29;13(1):52. doi: 10.1186/s40364-025-00764-4.
6
Prediction of drug target interaction based on under sampling strategy and random forest algorithm.基于欠采样策略和随机森林算法的药物靶点相互作用预测
PLoS One. 2025 Mar 6;20(3):e0318420. doi: 10.1371/journal.pone.0318420. eCollection 2025.
7
Prediction of influenza A virus-human protein-protein interactions using XGBoost with continuous and discontinuous amino acids information.使用具有连续和不连续氨基酸信息的XGBoost预测甲型流感病毒与人的蛋白质-蛋白质相互作用
PeerJ. 2025 Jan 30;13:e18863. doi: 10.7717/peerj.18863. eCollection 2025.
8
Discovery of Antimicrobial Lysins from the "Dark Matter" of Uncharacterized Phages Using Artificial Intelligence.利用人工智能从未鉴定噬菌体的“暗物质”中发现抗菌溶菌酶。
Adv Sci (Weinh). 2024 Aug;11(32):e2404049. doi: 10.1002/advs.202404049. Epub 2024 Jun 20.
9
A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond.蛋白质中心预测因子在生物分子相互作用研究中的综合综述:从蛋白质到核酸及其他。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae162.
10
MEG-PPIS: a fast protein-protein interaction site prediction method based on multi-scale graph information and equivariant graph neural network.MEG-PPIS:一种基于多尺度图信息和等变图神经网络的快速蛋白质-蛋白质相互作用位点预测方法。
Bioinformatics. 2024 Jan 5;40(5). doi: 10.1093/bioinformatics/btae269.
IEEE/ACM Trans Comput Biol Bioinform. 2021 May-Jun;18(3):985-994. doi: 10.1109/TCBB.2019.2953908. Epub 2021 Jun 3.
4
Protein-protein interaction site prediction through combining local and global features with deep neural networks.通过结合局部和全局特征与深度神经网络进行蛋白质-蛋白质相互作用位点预测。
Bioinformatics. 2020 Feb 15;36(4):1114-1120. doi: 10.1093/bioinformatics/btz699.
5
A Convolutional Neural Network System to Discriminate Drug-Target Interactions.卷积神经网络系统用于区分药物-靶标相互作用。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Jul-Aug;18(4):1315-1324. doi: 10.1109/TCBB.2019.2940187. Epub 2021 Aug 6.
6
SCRIBER: accurate and partner type-specific prediction of protein-binding residues from proteins sequences.SCRIBER:从蛋白质序列中准确预测与伴侣类型特异性相关的蛋白质结合残基。
Bioinformatics. 2019 Jul 15;35(14):i343-i353. doi: 10.1093/bioinformatics/btz324.
7
Hot spot prediction in protein-protein interactions by an ensemble system.通过集成系统预测蛋白质-蛋白质相互作用中的热点
BMC Syst Biol. 2018 Dec 31;12(Suppl 9):132. doi: 10.1186/s12918-018-0665-8.
8
Exploring the potential of 3D Zernike descriptors and SVM for protein-protein interface prediction.探索 3D Zernike 描述符和 SVM 在蛋白质-蛋白质界面预测中的应用潜力。
BMC Bioinformatics. 2018 Feb 6;19(1):35. doi: 10.1186/s12859-018-2043-3.
9
Protein binding hot spots prediction from sequence only by a new ensemble learning method.仅通过一种新的集成学习方法从序列预测蛋白质结合热点
Amino Acids. 2017 Oct;49(10):1773-1785. doi: 10.1007/s00726-017-2474-6. Epub 2017 Aug 1.
10
LNDriver: identifying driver genes by integrating mutation and expression data based on gene-gene interaction network.LNDriver:基于基因-基因相互作用网络整合突变和表达数据来识别驱动基因。
BMC Bioinformatics. 2016 Dec 23;17(Suppl 17):467. doi: 10.1186/s12859-016-1332-y.