• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于堆叠泛化和预训练蛋白质语言模型嵌入的人源 O 糖基化位点预测。

Prediction of human O-linked glycosylation sites using stacked generalization and embeddings from pre-trained protein language model.

机构信息

Department of Computer Science and Engineering Technology, University of Houston-Downtown, Houston, TX 77002, United States.

School of Computing, Wichita State University, Wichita, KS 67260, United States.

出版信息

Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae643.

DOI:10.1093/bioinformatics/btae643
PMID:39447059
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11552629/
Abstract

MOTIVATION

O-linked glycosylation, an essential post-translational modification process in Homo sapiens, involves attaching sugar moieties to the oxygen atoms of serine and/or threonine residues. It influences various biological and cellular functions. While threonine or serine residues within protein sequences are potential sites for O-linked glycosylation, not all serine and/or threonine residues undergo this modification, underscoring the importance of characterizing its occurrence. This study presents a novel approach for predicting intracellular and extracellular O-linked glycosylation events on proteins, which are crucial for comprehending cellular processes. Two base multi-layer perceptron models were trained by leveraging a stacked generalization framework. These base models respectively use ProtT5 and Ankh O-linked glycosylation site-specific embeddings whose combined predictions are used to train the meta-multi-layer perceptron model. Trained on extensive O-linked glycosylation datasets, the stacked-generalization model demonstrated high predictive performance on independent test datasets. Furthermore, the study emphasizes the distinction between nucleocytoplasmic and extracellular O-linked glycosylation, offering insights into their functional implications that were overlooked in previous studies. By integrating the protein language model's embedding with stacked generalization techniques, this approach enhances predictive accuracy of O-linked glycosylation events and illuminates the intricate roles of O-linked glycosylation in proteomics, potentially accelerating the discovery of novel glycosylation sites.

RESULTS

Stack-OglyPred-PLM produces Sensitivity, Specificity, Matthews Correlation Coefficient, and Accuracy of 90.50%, 89.60%, 0.464, and 89.70%, respectively on a benchmark NetOGlyc-4.0 independent test dataset. These results demonstrate that Stack-OglyPred-PLM is a robust computational tool to predict O-linked glycosylation sites in proteins.

AVAILABILITY AND IMPLEMENTATION

The developed tool, programs, training, and test dataset are available at https://github.com/PakhrinLab/Stack-OglyPred-PLM.

摘要

动机

O -linked 糖基化是人类中一种重要的翻译后修饰过程,涉及将糖基部分连接到丝氨酸和/或苏氨酸残基的氧原子上。它影响各种生物和细胞功能。虽然蛋白质序列中的丝氨酸或苏氨酸残基是 O 连接糖基化的潜在位点,但并非所有丝氨酸和/或苏氨酸残基都经历这种修饰,这突显了表征其发生的重要性。本研究提出了一种预测蛋白质细胞内和细胞外 O 连接糖基化事件的新方法,这对于理解细胞过程至关重要。两种基于碱基的多层感知器模型通过利用堆叠泛化框架进行训练。这些基本模型分别使用 ProtT5 和 Ankh O 连接糖基化位点特异性嵌入,其组合预测用于训练元多层感知器模型。在广泛的 O 连接糖基化数据集上进行训练,堆叠泛化模型在独立测试数据集上表现出高预测性能。此外,该研究强调了核质和细胞外 O 连接糖基化之间的区别,为其功能意义提供了新的见解,这些见解在以前的研究中被忽视了。通过将蛋白质语言模型的嵌入与堆叠泛化技术相结合,该方法提高了 O 连接糖基化事件的预测准确性,并阐明了 O 连接糖基化在蛋白质组学中的复杂作用,可能加速新糖基化位点的发现。

结果

Stack-OglyPred-PLM 在基准 NetOGlyc-4.0 独立测试数据集上的灵敏度、特异性、马修斯相关系数和准确性分别为 90.50%、89.60%、0.464 和 89.70%。这些结果表明,Stack-OglyPred-PLM 是一种强大的计算工具,可以预测蛋白质中的 O 连接糖基化位点。

可用性和实现

开发的工具、程序、培训和测试数据集可在 https://github.com/PakhrinLab/Stack-OglyPred-PLM 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7da3/11552629/3dc74b42eace/btae643f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7da3/11552629/798aee41f907/btae643f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7da3/11552629/df7284263988/btae643f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7da3/11552629/3dc74b42eace/btae643f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7da3/11552629/798aee41f907/btae643f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7da3/11552629/df7284263988/btae643f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7da3/11552629/3dc74b42eace/btae643f3.jpg

相似文献

1
Prediction of human O-linked glycosylation sites using stacked generalization and embeddings from pre-trained protein language model.基于堆叠泛化和预训练蛋白质语言模型嵌入的人源 O 糖基化位点预测。
Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae643.
2
HOTGpred: Enhancing human O-linked threonine glycosylation prediction using integrated pretrained protein language model-based features and multi-stage feature selection approach.HOTGpred:利用集成的预训练蛋白质语言模型为基础的特征和多阶段特征选择方法增强人类 O-连接 threonine 糖基化预测。
Comput Biol Med. 2024 Sep;179:108859. doi: 10.1016/j.compbiomed.2024.108859. Epub 2024 Jul 18.
3
LMNglyPred: prediction of human N-linked glycosylation sites using embeddings from a pre-trained protein language model.LMNglyPred:使用预先训练的蛋白质语言模型的嵌入来预测人类 N-连接糖基化位点。
Glycobiology. 2023 Jun 3;33(5):411-422. doi: 10.1093/glycob/cwad033.
4
LMCrot: an enhanced protein crotonylation site predictor by leveraging an interpretable window-level embedding from a transformer-based protein language model.LMCrot:一种基于转换器的蛋白质语言模型的可解释窗口级嵌入的增强型蛋白质巴豆酰化位点预测器。
Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae290.
5
Integrating Embeddings from Multiple Protein Language Models to Improve Protein -GlcNAc Site Prediction.整合来自多个蛋白质语言模型的嵌入以提高蛋白质-GlcNAc 位点预测。
Int J Mol Sci. 2023 Nov 6;24(21):16000. doi: 10.3390/ijms242116000.
6
pLMSNOSite: an ensemble-based approach for predicting protein S-nitrosylation sites by integrating supervised word embedding and embedding from pre-trained protein language model.pLMSNOSite:一种基于集成的方法,通过整合有监督的单词嵌入和预先训练的蛋白质语言模型的嵌入,来预测蛋白质的 S-亚硝化位点。
BMC Bioinformatics. 2023 Feb 8;24(1):41. doi: 10.1186/s12859-023-05164-9.
7
EMNGly: predicting N-linked glycosylation sites using the language models for feature extraction.EMNGly:使用特征提取的语言模型预测 N-连接糖基化位点。
Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad650.
8
Computational Prediction of N- and O-Linked Glycosylation Sites for Human and Mouse Proteins.计算预测人类和小鼠蛋白质的 N-和 O-连接糖基化位点。
Methods Mol Biol. 2022;2499:177-186. doi: 10.1007/978-1-0716-2317-6_9.
9
O-GlyThr: Prediction of human O-linked threonine glycosites using multi-feature fusion.O-GlyThr:使用多特征融合预测人类 O 链接苏氨酸糖基化位点。
Int J Biol Macromol. 2023 Jul 1;242(Pt 2):124761. doi: 10.1016/j.ijbiomac.2023.124761. Epub 2023 May 6.
10
TransPTM: a transformer-based model for non-histone acetylation site prediction.TransPTM:一种基于转换器的非组蛋白乙酰化位点预测模型。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae219.

引用本文的文献

1
Multimodal deep learning for predicting protein ubiquitination sites.用于预测蛋白质泛素化位点的多模态深度学习
Bioinform Adv. 2025 Aug 20;5(1):vbaf200. doi: 10.1093/bioadv/vbaf200. eCollection 2025.
2
The structural view of the protein PGD-219aa encoded by the circular RNA CircPGD.由环状RNA CircPGD编码的蛋白质PGD-219aa的结构视图。
J Mol Model. 2025 Aug 9;31(9):236. doi: 10.1007/s00894-025-06454-0.
3
Large Language Model (LLM)-Based Advances in Prediction of Post-translational Modification Sites in Proteins.基于大语言模型(LLM)在蛋白质翻译后修饰位点预测方面的进展。

本文引用的文献

1
SumoPred-PLM: human SUMOylation and SUMO2/3 sites Prediction using Pre-trained Protein Language Model.SumoPred-PLM:使用预训练蛋白质语言模型预测人类SUMO化和SUMO2/3位点
NAR Genom Bioinform. 2024 Feb 7;6(1):lqae011. doi: 10.1093/nargab/lqae011. eCollection 2024 Mar.
2
O-GlcNAcPRED-DL: Prediction of Protein O-GlcNAcylation Sites Based on an Ensemble Model of Deep Learning.O-GlcNAcPRED-DL:基于深度学习集成模型的蛋白质 O-GlcNAc 化位点预测。
J Proteome Res. 2024 Jan 5;23(1):95-106. doi: 10.1021/acs.jproteome.3c00458. Epub 2023 Dec 6.
3
LMPhosSite: A Deep Learning-Based Approach for General Protein Phosphorylation Site Prediction Using Embeddings from the Local Window Sequence and Pretrained Protein Language Model.
Methods Mol Biol. 2025;2941:313-355. doi: 10.1007/978-1-0716-4623-6_19.
4
Implications of Mucin-Type -Glycosylation in Alzheimer's Disease.粘蛋白型糖基化在阿尔茨海默病中的意义
Molecules. 2025 Apr 24;30(9):1895. doi: 10.3390/molecules30091895.
5
Enhanced O-glycosylation site prediction using explainable machine learning technique with spatial local environment.使用具有空间局部环境的可解释机器学习技术增强O-糖基化位点预测
Bioinformatics. 2025 Feb 4;41(2). doi: 10.1093/bioinformatics/btaf034.
6
TargetCLP: clathrin proteins prediction combining transformed and evolutionary scale modeling-based multi-view features via weighted feature integration approach.TargetCLP:通过加权特征整合方法结合基于变换和进化尺度建模的多视图特征进行网格蛋白蛋白质预测。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf026.
LMPhosSite:一种基于深度学习的方法,使用局部窗口序列的嵌入和预训练的蛋白质语言模型进行通用蛋白质磷酸化位点预测。
J Proteome Res. 2023 Aug 4;22(8):2548-2557. doi: 10.1021/acs.jproteome.2c00667. Epub 2023 Jul 17.
4
LMNglyPred: prediction of human N-linked glycosylation sites using embeddings from a pre-trained protein language model.LMNglyPred:使用预先训练的蛋白质语言模型的嵌入来预测人类 N-连接糖基化位点。
Glycobiology. 2023 Jun 3;33(5):411-422. doi: 10.1093/glycob/cwad033.
5
Global mapping of GalNAc-T isoform-specificities and O-glycosylation site-occupancy in a tissue-forming human cell line.在一种组织形成的人类细胞系中对 GalNAc-T 同工型特异性和 O-糖基化位点占有率进行全球映射。
Nat Commun. 2022 Oct 21;13(1):6257. doi: 10.1038/s41467-022-33806-8.
6
DeepLoc 2.0: multi-label subcellular localization prediction using protein language models.DeepLoc 2.0:使用蛋白质语言模型进行多标签亚细胞定位预测。
Nucleic Acids Res. 2022 Jul 5;50(W1):W228-W234. doi: 10.1093/nar/gkac278.
7
Protein embeddings and deep learning predict binding residues for various ligand classes.蛋白质嵌入和深度学习预测各种配体类的结合残基。
Sci Rep. 2021 Dec 13;11(1):23916. doi: 10.1038/s41598-021-03431-4.
8
DeepNGlyPred: A Deep Neural Network-Based Approach for Human N-Linked Glycosylation Site Prediction.DeepNGlyPred:一种基于深度神经网络的人类 N 连接糖基化位点预测方法。
Molecules. 2021 Dec 2;26(23):7314. doi: 10.3390/molecules26237314.
9
O-glycosylation site prediction for by combining properties and sequence features with support vector machine.通过结合性质、序列特征与支持向量机对O-糖基化位点进行预测。 (原句by前少了个词,表述不太完整准确,按现有内容尽量通顺翻译)
J Bioinform Comput Biol. 2022 Feb;20(1):2150029. doi: 10.1142/S0219720021500293. Epub 2021 Nov 19.
10
dbPTM in 2022: an updated database for exploring regulatory networks and functional associations of protein post-translational modifications.dbPTM 在 2022 年:一个更新的数据库,用于探索蛋白质翻译后修饰的调控网络和功能关联。
Nucleic Acids Res. 2022 Jan 7;50(D1):D471-D479. doi: 10.1093/nar/gkab1017.