• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用蛋白质语言模型的嵌入来改进蛋白质琥珀酰化位点预测。

Improving protein succinylation sites prediction using embeddings from protein language model.

机构信息

Department of Computer Science, Michigan Technological University, Houghton, MI, USA.

Department of Informatics, Bioinformatics and Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany.

出版信息

Sci Rep. 2022 Oct 8;12(1):16933. doi: 10.1038/s41598-022-21366-2.

DOI:10.1038/s41598-022-21366-2
PMID:36209286
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9547369/
Abstract

Protein succinylation is an important post-translational modification (PTM) responsible for many vital metabolic activities in cells, including cellular respiration, regulation, and repair. Here, we present a novel approach that combines features from supervised word embedding with embedding from a protein language model called ProtT5-XL-UniRef50 (hereafter termed, ProtT5) in a deep learning framework to predict protein succinylation sites. To our knowledge, this is one of the first attempts to employ embedding from a pre-trained protein language model to predict protein succinylation sites. The proposed model, dubbed LMSuccSite, achieves state-of-the-art results compared to existing methods, with performance scores of 0.36, 0.79, 0.79 for MCC, sensitivity, and specificity, respectively. LMSuccSite is likely to serve as a valuable resource for exploration of succinylation and its role in cellular physiology and disease.

摘要

蛋白质琥珀酰化是一种重要的翻译后修饰(PTM),负责细胞中的许多重要代谢活动,包括细胞呼吸、调节和修复。在这里,我们提出了一种新的方法,该方法结合了有监督的词嵌入特征和一种名为 ProtT5-XL-UniRef50(简称 ProtT5)的蛋白质语言模型的嵌入,在深度学习框架中预测蛋白质琥珀酰化位点。据我们所知,这是首次尝试使用预训练的蛋白质语言模型的嵌入来预测蛋白质琥珀酰化位点。所提出的模型,称为 LMSuccSite,与现有方法相比,取得了最先进的结果,MCC、敏感性和特异性的性能得分分别为 0.36、0.79 和 0.79。LMSuccSite 可能成为探索琥珀酰化及其在细胞生理学和疾病中的作用的有价值的资源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4251/9547916/1f3e8df2e7f3/41598_2022_21366_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4251/9547916/eb0850ed5045/41598_2022_21366_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4251/9547916/bf000ccb283c/41598_2022_21366_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4251/9547916/b0ce03cf7345/41598_2022_21366_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4251/9547916/17b646ad6b56/41598_2022_21366_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4251/9547916/cbad372cd796/41598_2022_21366_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4251/9547916/1f3e8df2e7f3/41598_2022_21366_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4251/9547916/eb0850ed5045/41598_2022_21366_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4251/9547916/bf000ccb283c/41598_2022_21366_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4251/9547916/b0ce03cf7345/41598_2022_21366_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4251/9547916/17b646ad6b56/41598_2022_21366_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4251/9547916/cbad372cd796/41598_2022_21366_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4251/9547916/1f3e8df2e7f3/41598_2022_21366_Fig6_HTML.jpg

相似文献

1
Improving protein succinylation sites prediction using embeddings from protein language model.利用蛋白质语言模型的嵌入来改进蛋白质琥珀酰化位点预测。
Sci Rep. 2022 Oct 8;12(1):16933. doi: 10.1038/s41598-022-21366-2.
2
DeepSuccinylSite: a deep learning based approach for protein succinylation site prediction.深度学习方法 DeepSuccinylSite 用于蛋白质琥珀酰化修饰位点预测。
BMC Bioinformatics. 2020 Apr 23;21(Suppl 3):63. doi: 10.1186/s12859-020-3342-z.
3
pLMSNOSite: an ensemble-based approach for predicting protein S-nitrosylation sites by integrating supervised word embedding and embedding from pre-trained protein language model.pLMSNOSite:一种基于集成的方法,通过整合有监督的单词嵌入和预先训练的蛋白质语言模型的嵌入,来预测蛋白质的 S-亚硝化位点。
BMC Bioinformatics. 2023 Feb 8;24(1):41. doi: 10.1186/s12859-023-05164-9.
4
LMCrot: an enhanced protein crotonylation site predictor by leveraging an interpretable window-level embedding from a transformer-based protein language model.LMCrot:一种基于转换器的蛋白质语言模型的可解释窗口级嵌入的增强型蛋白质巴豆酰化位点预测器。
Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae290.
5
A systematic identification of species-specific protein succinylation sites using joint element features information.利用联合元件特征信息对物种特异性蛋白质琥珀酰化位点进行系统鉴定。
Int J Nanomedicine. 2017 Aug 28;12:6303-6315. doi: 10.2147/IJN.S140875. eCollection 2017.
6
Characterization and Identification of Lysine Succinylation Sites based on Deep Learning Method.基于深度学习方法的赖氨酸琥珀酰化修饰位点的鉴定与特征分析。
Sci Rep. 2019 Nov 7;9(1):16175. doi: 10.1038/s41598-019-52552-4.
7
SuccinSite: a computational tool for the prediction of protein succinylation sites by exploiting the amino acid patterns and properties.琥珀酰化位点预测工具SuccinSite:利用氨基酸模式和特性预测蛋白质琥珀酰化位点的计算工具。
Mol Biosyst. 2016 Mar;12(3):786-95. doi: 10.1039/c5mb00853k. Epub 2016 Jan 7.
8
A Comprehensive Comparative Review of Protein Sequence-Based Computational Prediction Models of Lysine Succinylation Sites.赖氨酸琥珀酰化位点基于蛋白质序列的计算预测模型的综合比较评价
Curr Protein Pept Sci. 2022;23(11):744-756. doi: 10.2174/1389203723666220628121817.
9
Prediction of Protein Lysine Acylation by Integrating Primary Sequence Information with Multiple Functional Features.通过整合一级序列信息与多种功能特征预测蛋白质赖氨酸酰化
J Proteome Res. 2016 Dec 2;15(12):4234-4244. doi: 10.1021/acs.jproteome.6b00240. Epub 2016 Nov 2.
10
LSTMCNNsucc: A Bidirectional LSTM and CNN-Based Deep Learning Method for Predicting Lysine Succinylation Sites.LSTMCNNsucc:一种基于双向 LSTM 和 CNN 的深度学习方法,用于预测赖氨酸琥珀酰化位点。
Biomed Res Int. 2021 May 28;2021:9923112. doi: 10.1155/2021/9923112. eCollection 2021.

引用本文的文献

1
ResLysEmbed: a ResNet-based framework for succinylated lysine residue prediction using sequence and language model embeddings.ResLysEmbed:一种基于ResNet的框架,用于使用序列和语言模型嵌入预测琥珀酰化赖氨酸残基。
Bioinform Adv. 2025 Aug 22;5(1):vbaf198. doi: 10.1093/bioadv/vbaf198. eCollection 2025.
2
Bag-of-words is competitive with sum-of-embeddings language-inspired representations on protein inference.词袋模型在蛋白质推理方面与基于语言启发的词嵌入求和表示法具有竞争力。
PLoS One. 2025 Aug 6;20(8):e0325531. doi: 10.1371/journal.pone.0325531. eCollection 2025.
3
Hybrid protein-ligand binding residue prediction with protein language models: does the structure matter?

本文引用的文献

1
The global succinylation of SARS-CoV-2-infected host cells reveals drug targets.新冠病毒感染宿主细胞的全球琥珀酰化修饰研究揭示了药物作用靶标。
Proc Natl Acad Sci U S A. 2022 Jul 26;119(30):e2123065119. doi: 10.1073/pnas.2123065119. Epub 2022 Jul 12.
2
Contrastive learning on protein embeddings enlightens midnight zone.蛋白质嵌入的对比学习照亮了午夜区。
NAR Genom Bioinform. 2022 Jun 11;4(2):lqac043. doi: 10.1093/nargab/lqac043. eCollection 2022 Jun.
3
Deep Learning-Based Advances In Protein Posttranslational Modification Site and Protein Cleavage Prediction.
利用蛋白质语言模型进行混合蛋白质-配体结合残基预测:结构重要吗?
Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf431.
4
TCR-epiDiff: solving dual challenges of TCR generation and binding prediction.TCR-epiDiff:解决TCR生成和结合预测的双重挑战。
Bioinformatics. 2025 Jul 1;41(Supplement_1):i125-i132. doi: 10.1093/bioinformatics/btaf202.
5
Large Language Model (LLM)-Based Advances in Prediction of Post-translational Modification Sites in Proteins.基于大语言模型(LLM)在蛋白质翻译后修饰位点预测方面的进展。
Methods Mol Biol. 2025;2941:313-355. doi: 10.1007/978-1-0716-4623-6_19.
6
Large Context, Deeper Insights: Harnessing Large Language Models for Advancing Protein-Protein Interaction Analysis.大背景,更深刻的见解:利用大语言模型推动蛋白质-蛋白质相互作用分析
Methods Mol Biol. 2025;2941:243-267. doi: 10.1007/978-1-0716-4623-6_15.
7
A Survey of Biological Function Prediction Methods with Focus on Natural Language Processing (NLP) and Large Language Models (LLM).以自然语言处理(NLP)和大语言模型(LLM)为重点的生物功能预测方法综述。
Methods Mol Biol. 2025;2941:201-225. doi: 10.1007/978-1-0716-4623-6_13.
8
CNN-Meth: A Tool to Accurately Predict Lysine Methylation Sites Using Evolutionary Information-Based Protein Modeling.CNN-Meth:一种利用基于进化信息的蛋白质建模准确预测赖氨酸甲基化位点的工具。
Methods Mol Biol. 2025;2941:177-187. doi: 10.1007/978-1-0716-4623-6_11.
9
A Survey of Pretrained Protein Language Models.预训练蛋白质语言模型综述
Methods Mol Biol. 2025;2941:1-29. doi: 10.1007/978-1-0716-4623-6_1.
10
KD_MultiSucc: incorporating multi-teacher knowledge distillation and word embeddings for cross-species prediction of protein succinylation sites.KD_MultiSucc:结合多教师知识蒸馏和词嵌入用于蛋白质琥珀酰化位点的跨物种预测
Biol Methods Protoc. 2025 May 28;10(1):bpaf041. doi: 10.1093/biomethods/bpaf041. eCollection 2025.
基于深度学习的蛋白质翻译后修饰位点和蛋白质切割预测的进展。
Methods Mol Biol. 2022;2499:285-322. doi: 10.1007/978-1-0716-2317-6_15.
4
Protein language-model embeddings for fast, accurate, and alignment-free protein structure prediction.基于蛋白质语言模型的嵌入来实现快速、准确且无需对齐的蛋白质结构预测。
Structure. 2022 Aug 4;30(8):1169-1177.e4. doi: 10.1016/j.str.2022.05.001. Epub 2022 May 23.
5
Protein embeddings and deep learning predict binding residues for various ligand classes.蛋白质嵌入和深度学习预测各种配体类的结合残基。
Sci Rep. 2021 Dec 13;11(1):23916. doi: 10.1038/s41598-021-03431-4.
6
ECNet is an evolutionary context-integrated deep learning framework for protein engineering.ECNet 是一种用于蛋白质工程的进化上下文集成深度学习框架。
Nat Commun. 2021 Sep 30;12(1):5743. doi: 10.1038/s41467-021-25976-8.
7
MDCAN-Lys: A Model for Predicting Succinylation Sites Based on Multilane Dense Convolutional Attention Network.MDCAN-Lys:基于多车道密集卷积注意力网络的琥珀酰化位点预测模型。
Biomolecules. 2021 Jun 11;11(6):872. doi: 10.3390/biom11060872.
8
Learning the protein language: Evolution, structure, and function.学习蛋白质语言:进化、结构和功能。
Cell Syst. 2021 Jun 16;12(6):654-669.e3. doi: 10.1016/j.cels.2021.05.017.
9
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.生物结构和功能源于将无监督学习扩展到 2.5 亿个蛋白质序列。
Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.
10
Posttranslational modifications in proteins: resources, tools and prediction methods.蛋白质的翻译后修饰:资源、工具和预测方法。
Database (Oxford). 2021 Apr 7;2021. doi: 10.1093/database/baab012.