PhosBoost：使用梯度提升和蛋白质语言模型提高磷酸化预测召回率

PhosBoost: Improved phosphorylation prediction recall using gradient boosting and protein language models.

作者信息

Poretsky Elly, Andorf Carson M, Sen Taner Z

机构信息

Agricultural Research Service, Crop Improvement and Genetics Research Unit U.S. Department of Agriculture Albany CA United States.

Agricultural Research Service, Corn Insects and Crop Genetics Research U.S. Department of Agriculture Ames IA United States.

出版信息

Plant Direct. 2023 Dec 20;7(12):e554. doi: 10.1002/pld3.554. eCollection 2023 Dec.

DOI:10.1002/pld3.554

PMID:38124705

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10732782/

Abstract

Protein phosphorylation is a dynamic and reversible post-translational modification that regulates a variety of essential biological processes. The regulatory role of phosphorylation in cellular signaling pathways, protein-protein interactions, and enzymatic activities has motivated extensive research efforts to understand its functional implications. Experimental protein phosphorylation data in plants remains limited to a few species, necessitating a scalable and accurate prediction method. Here, we present PhosBoost, a machine-learning approach that leverages protein language models and gradient-boosting trees to predict protein phosphorylation from experimentally derived data. Trained on data obtained from a comprehensive plant phosphorylation database, qPTMplants, we compared the performance of PhosBoost to existing protein phosphorylation prediction methods, PhosphoLingo and DeepPhos. For serine and threonine prediction, PhosBoost achieved higher recall than PhosphoLingo and DeepPhos (.78, .56, and .14, respectively) while maintaining a competitive area under the precision-recall curve (.54, .56, and .42, respectively). PhosphoLingo and DeepPhos failed to predict any tyrosine phosphorylation sites, while PhosBoost achieved a recall score of .6. Despite the precision-recall tradeoff, PhosBoost offers improved performance when recall is prioritized while consistently providing more confident probability scores. A sequence-based pairwise alignment step improved prediction results for all classifiers by effectively increasing the number of inferred positive phosphosites. We provide evidence to show that PhosBoost models are transferable across species and scalable for genome-wide protein phosphorylation predictions. PhosBoost is freely and publicly available on GitHub.

摘要

蛋白质磷酸化是一种动态且可逆的翻译后修饰，可调节多种重要的生物学过程。磷酸化在细胞信号通路、蛋白质-蛋白质相互作用和酶活性中的调节作用激发了广泛的研究工作，以了解其功能影响。植物中的实验性蛋白质磷酸化数据仍然仅限于少数物种，因此需要一种可扩展且准确的预测方法。在此，我们提出了PhosBoost，这是一种机器学习方法，它利用蛋白质语言模型和梯度提升树从实验获得的数据中预测蛋白质磷酸化。在从全面的植物磷酸化数据库qPTMplants获得的数据上进行训练后，我们将PhosBoost的性能与现有的蛋白质磷酸化预测方法PhosphoLingo和DeepPhos进行了比较。对于丝氨酸和苏氨酸预测，PhosBoost的召回率高于PhosphoLingo和DeepPhos（分别为0.78、0.56和0.14），同时在精确召回曲线下保持有竞争力的面积（分别为0.54、0.56和0.42）。PhosphoLingo和DeepPhos未能预测任何酪氨酸磷酸化位点，而PhosBoost的召回率得分为0.6。尽管存在精确召回权衡，但当优先考虑召回率时，PhosBoost提供了改进的性能，同时始终提供更可靠的概率分数。基于序列的成对比对步骤通过有效增加推断的阳性磷酸化位点数量，改善了所有分类器的预测结果。我们提供证据表明，PhosBoost模型可跨物种转移，并且可扩展用于全基因组蛋白质磷酸化预测。PhosBoost可在GitHub上免费公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4b0/10732782/d33fe2640ad2/PLD3-7-e554-g004.jpg

相似文献

PhosBoost: Improved phosphorylation prediction recall using gradient boosting and protein language models.PhosBoost：使用梯度提升和蛋白质语言模型提高磷酸化预测召回率

Plant Direct. 2023 Dec 20;7(12):e554. doi: 10.1002/pld3.554. eCollection 2023 Dec.

DeepPhos: prediction of protein phosphorylation sites with deep learning.DeepPhos：利用深度学习预测蛋白质磷酸化位点

Bioinformatics. 2019 Aug 15;35(16):2766-2773. doi: 10.1093/bioinformatics/bty1051.

POOE: predicting oomycete effectors based on a pre-trained large protein language model.POOE：基于预先训练的大型蛋白质语言模型预测卵菌效应子。

mSystems. 2024 Jan 23;9(1):e0100423. doi: 10.1128/msystems.01004-23. Epub 2023 Dec 11.

Boosting phosphorylation site prediction with sequence feature-based machine learning.基于序列特征的机器学习提高磷酸化位点预测。

Proteins. 2020 Feb;88(2):284-291. doi: 10.1002/prot.25801. Epub 2019 Aug 22.

Predicting protein phosphorylation sites in soybean using interpretable deep tabular learning network.利用可解释的深度表格学习网络预测大豆中的蛋白质磷酸化位点。

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac015.

LMPhosSite: A Deep Learning-Based Approach for General Protein Phosphorylation Site Prediction Using Embeddings from the Local Window Sequence and Pretrained Protein Language Model.LMPhosSite：一种基于深度学习的方法，使用局部窗口序列的嵌入和预训练的蛋白质语言模型进行通用蛋白质磷酸化位点预测。

J Proteome Res. 2023 Aug 4;22(8):2548-2557. doi: 10.1021/acs.jproteome.2c00667. Epub 2023 Jul 17.

Predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins.利用机器学习整合蛋白质的序列、结构和功能信息来预测磷酸化位点。

J Transl Med. 2021 May 24;19(1):218. doi: 10.1186/s12967-021-02851-0.

Nphos: Database and Predictor of Protein N-phosphorylation.Nphos：蛋白质 N-磷酸化数据库和预测器。

Genomics Proteomics Bioinformatics. 2024 Sep 13;22(3). doi: 10.1093/gpbjnl/qzae032.

Explainable Machine Learning Techniques To Predict Amiodarone-Induced Thyroid Dysfunction Risk: Multicenter, Retrospective Study With External Validation.可解释机器学习技术预测胺碘酮诱导甲状腺功能障碍风险：多中心回顾性研究及外部验证。

J Med Internet Res. 2023 Feb 7;25:e43734. doi: 10.2196/43734.

Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?预测模型工具能否识别 ACL 重建术后阿片类药物使用时间延长的高风险患者？

Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.

引用本文的文献

Topology-driven discovery of transmembrane protein S-palmitoylation.拓扑结构驱动的跨膜蛋白S-棕榈酰化修饰发现

J Biol Chem. 2025 Mar;301(3):108259. doi: 10.1016/j.jbc.2025.108259. Epub 2025 Feb 3.

GPS-pPLM: A Language Model for Prediction of Prokaryotic Phosphorylation Sites.GPS-pPLM：一种用于预测原核磷酸化位点的语言模型。

Cells. 2024 Nov 8;13(22):1854. doi: 10.3390/cells13221854.

Topology-Driven Discovery of Transmembrane Protein -Palmitoylation.拓扑驱动的跨膜蛋白棕榈酰化发现

bioRxiv. 2024 Sep 8:2024.09.08.611865. doi: 10.1101/2024.09.08.611865.

本文引用的文献

Protein phosphorylation: A molecular switch in plant signaling.蛋白质磷酸化：植物信号转导中的分子开关。

Cell Rep. 2023 Jul 25;42(7):112729. doi: 10.1016/j.celrep.2023.112729. Epub 2023 Jul 4.

JBrowse 2: a modular genome browser with views of synteny and structural variation.JBrowse 2：一个具有基因同线性和结构变异视图的模块化基因组浏览器。

Genome Biol. 2023 Apr 17;24(1):74. doi: 10.1186/s13059-023-02914-z.

Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions.Phosformer：一种可解释的用于预测蛋白激酶特异性磷酸化的转换器模型。

Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad046.

Novel machine learning approaches revolutionize protein knowledge.新型机器学习方法彻底改变了蛋白质知识。

Trends Biochem Sci. 2023 Apr;48(4):345-359. doi: 10.1016/j.tibs.2022.11.001. Epub 2022 Dec 9.

UniProt: the Universal Protein Knowledgebase in 2023.UniProt：2023 年的通用蛋白质知识库。

Nucleic Acids Res. 2023 Jan 6;51(D1):D523-D531. doi: 10.1093/nar/gkac1052.

Ensemble learning-based feature selection for phosphorylation site detection.基于集成学习的磷酸化位点检测特征选择

Front Genet. 2022 Oct 21;13:984068. doi: 10.3389/fgene.2022.984068. eCollection 2022.

SETH predicts nuances of residue disorder from protein embeddings.SETH从蛋白质嵌入中预测残基无序的细微差别。

Front Bioinform. 2022 Oct 10;2:1019597. doi: 10.3389/fbinf.2022.1019597. eCollection 2022.

qPTM: an updated database for PTM dynamics in human, mouse, rat and yeast.qPTM：一个关于人类、小鼠、大鼠和酵母中翻译后修饰动态变化的更新数据库。

Nucleic Acids Res. 2023 Jan 6;51(D1):D479-D487. doi: 10.1093/nar/gkac820.

Mini-review: Recent advances in post-translational modification site prediction based on deep learning.小型综述：基于深度学习的翻译后修饰位点预测的最新进展

Comput Struct Biotechnol J. 2022 Jun 30;20:3522-3532. doi: 10.1016/j.csbj.2022.06.045. eCollection 2022.

Deconvoluting signals downstream of growth and immune receptor kinases by phosphocodes of the BSU1 family phosphatases.通过 BSU1 家族磷酸酶的磷酸码对生长和免疫受体激酶下游信号进行去卷积。

Nat Plants. 2022 Jun;8(6):646-655. doi: 10.1038/s41477-022-01167-1. Epub 2022 Jun 13.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

PhosBoost：使用梯度提升和蛋白质语言模型提高磷酸化预测召回率

PhosBoost: Improved phosphorylation prediction recall using gradient boosting and protein language models.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献