GT-Finder：使用预训练的 BERT 语言模型对葡萄糖转运蛋白家族进行分类。

GT-Finder: Classify the family of glucose transporters with pre-trained BERT language models.

机构信息

Department of Computer Science & Engineering, Yuan Ze University, Chungli, 32003, Taiwan.

出版信息

Comput Biol Med. 2021 Apr;131:104259. doi: 10.1016/j.compbiomed.2021.104259. Epub 2021 Feb 7.

DOI:10.1016/j.compbiomed.2021.104259

Abstract

Recently, language representation models have drawn a lot of attention in the field of natural language processing (NLP) due to their remarkable results. Among them, BERT (Bidirectional Encoder Representations from Transformers) has proven to be a simple, yet powerful language model that has achieved novel state-of-the-art performance. BERT adopted the concept of contextualized word embeddings to capture the semantics and context in which words appear. We utilized pre-trained BERT models to extract features from protein sequences for discriminating three families of glucose transporters: the major facilitator superfamily of glucose transporters (GLUTs), the sodium-glucose linked transporters (SGLTs), and the sugars will eventually be exported transporters (SWEETs). We treated protein sequences as sentences and transformed them into fixed-length meaningful vectors where a 768- or 1024-dimensional vector represents each amino acid. We observed that BERT-Base and BERT-Large models improved the performance by more than 4% in terms of average sensitivity and Matthews correlation coefficient (MCC), indicating the efficiency of this approach. We also developed a bidirectional transformer-based protein model (TransportersBERT) for comparison with existing pre-trained BERT models.

摘要

最近，语言表示模型在自然语言处理 (NLP) 领域引起了广泛关注，因为它们取得了显著的成果。其中，BERT（来自 Transformer 的双向编码器表示）已被证明是一种简单而强大的语言模型，它实现了新的最先进的性能。BERT 采用了上下文化词嵌入的概念，以捕获单词出现的语义和上下文。我们利用预先训练的 BERT 模型从蛋白质序列中提取特征，以区分三种葡萄糖转运蛋白家族：主要易化因子超家族葡萄糖转运蛋白 (GLUTs)、钠-葡萄糖协同转运蛋白 (SGLTs) 和糖最终输出转运蛋白 (SWEETs)。我们将蛋白质序列视为句子，并将其转换为固定长度的有意义向量，其中 768 或 1024 维向量表示每个氨基酸。我们观察到 BERT-Base 和 BERT-Large 模型在平均灵敏度和马修斯相关系数 (MCC) 方面提高了 4%以上的性能，表明了这种方法的效率。我们还开发了一种基于双向转换器的蛋白质模型 (TransportersBERT)，与现有的预训练 BERT 模型进行比较。

相似文献

GT-Finder: Classify the family of glucose transporters with pre-trained BERT language models.

Comput Biol Med. 2021 Apr;131:104259. doi: 10.1016/j.compbiomed.2021.104259. Epub 2021 Feb 7.

A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information.

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab005.

TRP-BERT: Discrimination of transient receptor potential (TRP) channels using contextual representations from deep bidirectional transformer based on BERT.

Comput Biol Med. 2021 Oct;137:104821. doi: 10.1016/j.compbiomed.2021.104821. Epub 2021 Sep 1.

ActTRANS: Functional classification in active transport proteins based on transfer learning and contextual representations.

Comput Biol Chem. 2021 Aug;93:107537. doi: 10.1016/j.compbiolchem.2021.107537. Epub 2021 Jun 29.

Identification of efflux proteins based on contextual representations with deep bidirectional transformer encoders.

Anal Biochem. 2021 Nov 15;633:114416. doi: 10.1016/j.ab.2021.114416. Epub 2021 Oct 14.

BERT-based Ranking for Biomedical Entity Normalization.

AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:269-277. eCollection 2020.

Deep contextualized embeddings for quantifying the informative content in biomedical text summarization.

Comput Methods Programs Biomed. 2020 Feb;184:105117. doi: 10.1016/j.cmpb.2019.105117. Epub 2019 Oct 4.

BERT-Promoter: An improved sequence-based predictor of DNA promoter using BERT pre-trained model and SHAP feature selection.

Comput Biol Chem. 2022 Aug;99:107732. doi: 10.1016/j.compbiolchem.2022.107732. Epub 2022 Jul 14.

When BERT meets Bilbo: a learning curve analysis of pretrained language model on disease classification.

BMC Med Inform Decis Mak. 2022 Apr 5;21(Suppl 9):377. doi: 10.1186/s12911-022-01829-2.

Multi-Label Classification in Patient-Doctor Dialogues With the RoBERTa-WWM-ext + CNN (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach With Whole Word Masking Extended Combining a Convolutional Neural Network) Model: Named Entity Study.

JMIR Med Inform. 2022 Apr 21;10(4):e35606. doi: 10.2196/35606.

引用本文的文献

Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models.

Database (Oxford). 2025 May 30;2025. doi: 10.1093/database/baaf027.

NA_mCNN: Classification of Sodium Transporters in Membrane Proteins by Integrating Multi-Window Deep Learning and ProtTrans for Their Therapeutic Potential.

J Proteome Res. 2025 May 2;24(5):2324-2335. doi: 10.1021/acs.jproteome.4c00884. Epub 2025 Apr 7.

Structural and biochemical insights of xylose MFS and SWEET transporters in microbial cell factories: challenges to lignocellulosic hydrolysates fermentation.

Front Microbiol. 2024 Sep 27;15:1452240. doi: 10.3389/fmicb.2024.1452240. eCollection 2024.

Predicting ICU Readmission from Electronic Health Records via BERTopic with Long Short Term Memory Network Approach.

J Clin Med. 2024 Sep 18;13(18):5503. doi: 10.3390/jcm13185503.

OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features.

Front Genet. 2023 Apr 6;14:1139626. doi: 10.3389/fgene.2023.1139626. eCollection 2023.

Comparison of Chest Radiograph Captions Based on Natural Language Processing vs Completed by Radiologists.

JAMA Netw Open. 2023 Feb 1;6(2):e2255113. doi: 10.1001/jamanetworkopen.2022.55113.

Collectively encoding protein properties enriches protein language models.

BMC Bioinformatics. 2022 Nov 8;23(1):467. doi: 10.1186/s12859-022-05031-z.

ISTRF: Identification of sucrose transporter using random forest.

Front Genet. 2022 Sep 12;13:1012828. doi: 10.3389/fgene.2022.1012828. eCollection 2022.

BERT-PPII: The Polyproline Type II Helix Structure Prediction Model Based on BERT and Multichannel CNN.

Biomed Res Int. 2022 Aug 24;2022:9015123. doi: 10.1155/2022/9015123. eCollection 2022.

ProtTrans-Glutar: Incorporating Features From Pre-trained Transformer-Based Models for Predicting Glutarylation Sites.

Front Genet. 2022 May 31;13:885929. doi: 10.3389/fgene.2022.885929. eCollection 2022.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

GT-Finder：使用预训练的 BERT 语言模型对葡萄糖转运蛋白家族进行分类。

GT-Finder: Classify the family of glucose transporters with pre-trained BERT language models.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献