ProtTrans-Glutar：整合基于预训练Transformer模型的特征以预测戊二酰化位点

ProtTrans-Glutar: Incorporating Features From Pre-trained Transformer-Based Models for Predicting Glutarylation Sites.

作者信息

Indriani Fatma, Mahmudah Kunti Robiatul, Purnama Bedy, Satou Kenji

机构信息

Graduate School of Natural Science and Technology, Kanazawa University, Kanazawa, Japan.

Department of Computer Science, Lambung Mangkurat University, Banjarmasin, Indonesia.

出版信息

Front Genet. 2022 May 31;13:885929. doi: 10.3389/fgene.2022.885929. eCollection 2022.

DOI:10.3389/fgene.2022.885929

PMID:35711929

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9194472/

Abstract

Lysine glutarylation is a post-translational modification (PTM) that plays a regulatory role in various physiological and biological processes. Identifying glutarylated peptides using proteomic techniques is expensive and time-consuming. Therefore, developing computational models and predictors can prove useful for rapid identification of glutarylation. In this study, we propose a model called ProtTrans-Glutar to classify a protein sequence into positive or negative glutarylation site by combining traditional sequence-based features with features derived from a pre-trained transformer-based protein model. The features of the model were constructed by combining several feature sets, namely the distribution feature (from composition/transition/distribution encoding), enhanced amino acid composition (EAAC), and features derived from the ProtT5-XL-UniRef50 model. Combined with random under-sampling and XGBoost classification method, our model obtained recall, specificity, and AUC scores of 0.7864, 0.6286, and 0.7075 respectively on an independent test set. The recall and AUC scores were notably higher than those of the previous glutarylation prediction models using the same dataset. This high recall score suggests that our method has the potential to identify new glutarylation sites and facilitate further research on the glutarylation process.

摘要

赖氨酸戊二酰化是一种翻译后修饰（PTM），在各种生理和生物学过程中发挥调节作用。使用蛋白质组学技术鉴定戊二酰化肽既昂贵又耗时。因此，开发计算模型和预测器对于快速鉴定戊二酰化可能是有用的。在本研究中，我们提出了一种名为ProtTrans-Glutar的模型，通过将基于传统序列的特征与从预训练的基于Transformer的蛋白质模型衍生的特征相结合，将蛋白质序列分类为阳性或阴性戊二酰化位点。该模型的特征是通过组合几个特征集构建的，即分布特征（来自组成/转换/分布编码）、增强氨基酸组成（EAAC）以及从ProtT5-XL-UniRef50模型衍生的特征。结合随机欠采样和XGBoost分类方法，我们的模型在独立测试集上分别获得了0.7864、0.6286和0.7075的召回率、特异性和AUC分数。召回率和AUC分数明显高于使用相同数据集的先前戊二酰化预测模型。这种高召回率表明我们的方法有潜力识别新的戊二酰化位点，并促进对戊二酰化过程的进一步研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe0d/9194472/98a9b07e6a47/fgene-13-885929-g001.jpg

相似文献

ProtTrans-Glutar: Incorporating Features From Pre-trained Transformer-Based Models for Predicting Glutarylation Sites.ProtTrans-Glutar：整合基于预训练Transformer模型的特征以预测戊二酰化位点

Front Genet. 2022 May 31;13:885929. doi: 10.3389/fgene.2022.885929. eCollection 2022.

iGlu_AdaBoost: Identification of Lysine Glutarylation Using the AdaBoost Classifier.iGlu_AdaBoost：使用 AdaBoost 分类器鉴定赖氨酸瓜氨酸化

J Proteome Res. 2021 Jan 1;20(1):191-201. doi: 10.1021/acs.jproteome.0c00314. Epub 2020 Oct 22.

Characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites.基于底物结合位点中位置的内在相关性对赖氨酸瓜氨酸化的表征和鉴定。

BMC Bioinformatics. 2019 Feb 4;19(Suppl 13):384. doi: 10.1186/s12859-018-2394-9.

Prediction of lysine glutarylation sites by maximum relevance minimum redundancy feature selection.基于最大相关最小冗余特征选择的赖氨酸戊二酰化位点预测

Anal Biochem. 2018 Jun 1;550:1-7. doi: 10.1016/j.ab.2018.04.005. Epub 2018 Apr 8.

RF-GlutarySite: a random forest based predictor for glutarylation sites.RF-GlutarySite：基于随机森林的谷氨酰化位点预测器。

Mol Omics. 2019 Jun 1;15(3):189-204. doi: 10.1039/c9mo00028c. Epub 2019 Apr 26.

FCCCSR_Glu: a semi-supervised learning model based on FCCCSR algorithm for prediction of glutarylation sites.FCCCSR_Glu：一种基于 FCCCSR 算法的半监督学习模型，用于预测谷氨酰化位点。

Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac421.

DeepDN_iGlu: prediction of lysine glutarylation sites based on attention residual learning method and DenseNet.DeepDN_iGlu：基于注意力残差学习方法和 DenseNet 的赖氨酸瓜氨酸化位点预测。

Math Biosci Eng. 2023 Jan;20(2):2815-2830. doi: 10.3934/mbe.2023132. Epub 2022 Dec 1.

iGluK-Deep: computational identification of lysine glutarylation sites using deep neural networks with general pseudo amino acid compositions.iGluK-Deep：利用具有通用伪氨基酸组成的深度神经网络对赖氨酸戊二酰化位点进行计算识别。

J Biomol Struct Dyn. 2022;40(22):11691-11704. doi: 10.1080/07391102.2021.1962738. Epub 2021 Aug 16.

GBDT_KgluSite: An improved computational prediction model for lysine glutarylation sites based on feature fusion and GBDT classifier.GBDT_KgluSite：一种基于特征融合和 GBDT 分类器的赖氨酸谷氨酰化位点改进计算预测模型。

BMC Genomics. 2023 Dec 11;24(1):765. doi: 10.1186/s12864-023-09834-z.

Deep Neural Network Framework Based on Word Embedding for Protein Glutarylation Sites Prediction.基于词嵌入的深度神经网络框架用于蛋白质戊二酰化位点预测

Life (Basel). 2022 Aug 10;12(8):1213. doi: 10.3390/life12081213.

引用本文的文献

A Survey of Pretrained Protein Language Models.预训练蛋白质语言模型综述

Methods Mol Biol. 2025;2941:1-29. doi: 10.1007/978-1-0716-4623-6_1.

A Suite of Foundation Models Captures the Contextual Interplay Between Codons.一组基础模型捕捉到密码子之间的上下文相互作用。

bioRxiv. 2024 Oct 13:2024.10.10.617568. doi: 10.1101/2024.10.10.617568.

Current computational tools for protein lysine acylation site prediction.当前用于预测蛋白质赖氨酸酰化位点的计算工具。

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae469.

Post-translational modification prediction via prompt-based fine-tuning of a GPT-2 model.基于提示的 GPT-2 模型微调进行翻译后修饰预测。

Nat Commun. 2024 Aug 7;15(1):6699. doi: 10.1038/s41467-024-51071-9.

BMC Genomics. 2023 Dec 11;24(1):765. doi: 10.1186/s12864-023-09834-z.

本文引用的文献

BERT-Kgly: A Bidirectional Encoder Representations From Transformers (BERT)-Based Model for Predicting Lysine Glycation Site for .BERT-Kgly：一种基于双向编码器表征变换器（BERT）的赖氨酸糖基化位点预测模型

Front Bioinform. 2022 Feb 18;2:834153. doi: 10.3389/fbinf.2022.834153. eCollection 2022.

ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning.ProtTrans：通过自监督学习理解生命语言。

IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. doi: 10.1109/TPAMI.2021.3095381. Epub 2022 Sep 14.

FAD-BERT: Improved prediction of FAD binding sites using pre-training of deep bidirectional transformers.FAD-BERT：使用深度双向转换器的预训练改进 FAD 结合位点预测。

Comput Biol Med. 2021 Apr;131:104258. doi: 10.1016/j.compbiomed.2021.104258. Epub 2021 Feb 8.

GT-Finder: Classify the family of glucose transporters with pre-trained BERT language models.GT-Finder：使用预训练的 BERT 语言模型对葡萄糖转运蛋白家族进行分类。

Comput Biol Med. 2021 Apr;131:104259. doi: 10.1016/j.compbiomed.2021.104259. Epub 2021 Feb 7.

iGlu_AdaBoost: Identification of Lysine Glutarylation Using the AdaBoost Classifier.iGlu_AdaBoost：使用 AdaBoost 分类器鉴定赖氨酸瓜氨酸化

J Proteome Res. 2021 Jan 1;20(1):191-201. doi: 10.1021/acs.jproteome.0c00314. Epub 2020 Oct 22.

RF-GlutarySite: a random forest based predictor for glutarylation sites.RF-GlutarySite：基于随机森林的谷氨酰化位点预测器。

Mol Omics. 2019 Jun 1;15(3):189-204. doi: 10.1039/c9mo00028c. Epub 2019 Apr 26.

BMC Bioinformatics. 2019 Feb 4;19(Suppl 13):384. doi: 10.1186/s12859-018-2394-9.

iGlu-Lys: A Predictor for Lysine Glutarylation Through Amino Acid Pair Order Features.iGlu-Lys：一种通过氨基酸对序特征预测赖氨酸谷氨酰化的方法。

IEEE Trans Nanobioscience. 2018 Oct;17(4):394-401. doi: 10.1109/TNB.2018.2848673. Epub 2018 Jun 18.

Prediction of lysine glutarylation sites by maximum relevance minimum redundancy feature selection.基于最大相关最小冗余特征选择的赖氨酸戊二酰化位点预测

Anal Biochem. 2018 Jun 1;550:1-7. doi: 10.1016/j.ab.2018.04.005. Epub 2018 Apr 8.

iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences.iFeature：一个用于从蛋白质和肽序列中提取和选择特征的 Python 包和网络服务器。

Bioinformatics. 2018 Jul 15;34(14):2499-2502. doi: 10.1093/bioinformatics/bty140.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

ProtTrans-Glutar：整合基于预训练Transformer模型的特征以预测戊二酰化位点

ProtTrans-Glutar: Incorporating Features From Pre-trained Transformer-Based Models for Predicting Glutarylation Sites.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献