Suppr超能文献

PhosBoost:使用梯度提升和蛋白质语言模型提高磷酸化预测召回率

PhosBoost: Improved phosphorylation prediction recall using gradient boosting and protein language models.

作者信息

Poretsky Elly, Andorf Carson M, Sen Taner Z

机构信息

Agricultural Research Service, Crop Improvement and Genetics Research Unit U.S. Department of Agriculture Albany CA United States.

Agricultural Research Service, Corn Insects and Crop Genetics Research U.S. Department of Agriculture Ames IA United States.

出版信息

Plant Direct. 2023 Dec 20;7(12):e554. doi: 10.1002/pld3.554. eCollection 2023 Dec.

Abstract

Protein phosphorylation is a dynamic and reversible post-translational modification that regulates a variety of essential biological processes. The regulatory role of phosphorylation in cellular signaling pathways, protein-protein interactions, and enzymatic activities has motivated extensive research efforts to understand its functional implications. Experimental protein phosphorylation data in plants remains limited to a few species, necessitating a scalable and accurate prediction method. Here, we present PhosBoost, a machine-learning approach that leverages protein language models and gradient-boosting trees to predict protein phosphorylation from experimentally derived data. Trained on data obtained from a comprehensive plant phosphorylation database, qPTMplants, we compared the performance of PhosBoost to existing protein phosphorylation prediction methods, PhosphoLingo and DeepPhos. For serine and threonine prediction, PhosBoost achieved higher recall than PhosphoLingo and DeepPhos (.78, .56, and .14, respectively) while maintaining a competitive area under the precision-recall curve (.54, .56, and .42, respectively). PhosphoLingo and DeepPhos failed to predict any tyrosine phosphorylation sites, while PhosBoost achieved a recall score of .6. Despite the precision-recall tradeoff, PhosBoost offers improved performance when recall is prioritized while consistently providing more confident probability scores. A sequence-based pairwise alignment step improved prediction results for all classifiers by effectively increasing the number of inferred positive phosphosites. We provide evidence to show that PhosBoost models are transferable across species and scalable for genome-wide protein phosphorylation predictions. PhosBoost is freely and publicly available on GitHub.

摘要

蛋白质磷酸化是一种动态且可逆的翻译后修饰,可调节多种重要的生物学过程。磷酸化在细胞信号通路、蛋白质-蛋白质相互作用和酶活性中的调节作用激发了广泛的研究工作,以了解其功能影响。植物中的实验性蛋白质磷酸化数据仍然仅限于少数物种,因此需要一种可扩展且准确的预测方法。在此,我们提出了PhosBoost,这是一种机器学习方法,它利用蛋白质语言模型和梯度提升树从实验获得的数据中预测蛋白质磷酸化。在从全面的植物磷酸化数据库qPTMplants获得的数据上进行训练后,我们将PhosBoost的性能与现有的蛋白质磷酸化预测方法PhosphoLingo和DeepPhos进行了比较。对于丝氨酸和苏氨酸预测,PhosBoost的召回率高于PhosphoLingo和DeepPhos(分别为0.78、0.56和0.14),同时在精确召回曲线下保持有竞争力的面积(分别为0.54、0.56和0.42)。PhosphoLingo和DeepPhos未能预测任何酪氨酸磷酸化位点,而PhosBoost的召回率得分为0.6。尽管存在精确召回权衡,但当优先考虑召回率时,PhosBoost提供了改进的性能,同时始终提供更可靠的概率分数。基于序列的成对比对步骤通过有效增加推断的阳性磷酸化位点数量,改善了所有分类器的预测结果。我们提供证据表明,PhosBoost模型可跨物种转移,并且可扩展用于全基因组蛋白质磷酸化预测。PhosBoost可在GitHub上免费公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4b0/10732782/d33fe2640ad2/PLD3-7-e554-g004.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验