• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

增强膜转运蛋白的鉴定:结合 ProtBERT-BFD 和卷积神经网络的混合方法。

Enhanced identification of membrane transport proteins: a hybrid approach combining ProtBERT-BFD and convolutional neural networks.

机构信息

Department of Computer Science and Software Engineering, Concordia University, Montreal, Canada.

出版信息

J Integr Bioinform. 2023 Jul 28;20(2). doi: 10.1515/jib-2022-0055. eCollection 2023 Jun 1.

DOI:10.1515/jib-2022-0055
PMID:37497772
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10389051/
Abstract

Transmembrane transport proteins (transporters) play a crucial role in the fundamental cellular processes of all organisms by facilitating the transport of hydrophilic substrates across hydrophobic membranes. Despite the availability of numerous membrane protein sequences, their structures and functions remain largely elusive. Recently, natural language processing (NLP) techniques have shown promise in the analysis of protein sequences. Bidirectional Encoder Representations from Transformers (BERT) is an NLP technique adapted for proteins to learn contextual embeddings of individual amino acids within a protein sequence. Our previous strategy, TooT-BERT-T, differentiated transporters from non-transporters by employing a logistic regression classifier with fine-tuned representations from ProtBERT-BFD. In this study, we expand upon this approach by utilizing representations from ProtBERT, ProtBERT-BFD, and MembraneBERT in combination with classical classifiers. Additionally, we introduce TooT-BERT-CNN-T, a novel method that fine-tunes ProtBERT-BFD and discriminates transporters using a Convolutional Neural Network (CNN). Our experimental results reveal that CNN surpasses traditional classifiers in discriminating transporters from non-transporters, achieving an MCC of 0.89 and an accuracy of 95.1 % on the independent test set. This represents an improvement of 0.03 and 1.11 percentage points compared to TooT-BERT-T, respectively.

摘要

跨膜转运蛋白(transporters)通过促进亲水分子穿过疏水分子膜,在所有生物体的基本细胞过程中发挥着关键作用。尽管有大量的膜蛋白序列,但它们的结构和功能仍然很大程度上难以捉摸。最近,自然语言处理(NLP)技术在分析蛋白质序列方面显示出了潜力。Bidirectional Encoder Representations from Transformers (BERT) 是一种适用于蛋白质的 NLP 技术,用于学习蛋白质序列中单个氨基酸的上下文嵌入。我们之前的策略 TooT-BERT-T 通过使用 ProtBERT-BFD 微调表示的逻辑回归分类器来区分转运蛋白和非转运蛋白。在这项研究中,我们通过结合使用 ProtBERT、ProtBERT-BFD 和 MembraneBERT 的表示以及经典分类器来扩展了该方法。此外,我们引入了 TooT-BERT-CNN-T,这是一种使用卷积神经网络(CNN)微调 ProtBERT-BFD 并区分转运蛋白的新方法。我们的实验结果表明,CNN 在区分转运蛋白和非转运蛋白方面优于传统分类器,在独立测试集上的 MCC 为 0.89,准确率为 95.1%。与 TooT-BERT-T 相比,分别提高了 0.03 和 1.11 个百分点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6f5/10389051/9371cce2c718/j_jib-2022-0055_fig_008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6f5/10389051/41855737688a/j_jib-2022-0055_fig_001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6f5/10389051/c50454bc7d78/j_jib-2022-0055_fig_002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6f5/10389051/cb80c7c5b53d/j_jib-2022-0055_fig_003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6f5/10389051/b9737732ba8c/j_jib-2022-0055_fig_004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6f5/10389051/9bdd2f746d6e/j_jib-2022-0055_fig_005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6f5/10389051/4ce13b0fb1de/j_jib-2022-0055_fig_006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6f5/10389051/f2c1dd8e911b/j_jib-2022-0055_fig_007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6f5/10389051/9371cce2c718/j_jib-2022-0055_fig_008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6f5/10389051/41855737688a/j_jib-2022-0055_fig_001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6f5/10389051/c50454bc7d78/j_jib-2022-0055_fig_002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6f5/10389051/cb80c7c5b53d/j_jib-2022-0055_fig_003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6f5/10389051/b9737732ba8c/j_jib-2022-0055_fig_004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6f5/10389051/9bdd2f746d6e/j_jib-2022-0055_fig_005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6f5/10389051/4ce13b0fb1de/j_jib-2022-0055_fig_006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6f5/10389051/f2c1dd8e911b/j_jib-2022-0055_fig_007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6f5/10389051/9371cce2c718/j_jib-2022-0055_fig_008.jpg

相似文献

1
Enhanced identification of membrane transport proteins: a hybrid approach combining ProtBERT-BFD and convolutional neural networks.增强膜转运蛋白的鉴定:结合 ProtBERT-BFD 和卷积神经网络的混合方法。
J Integr Bioinform. 2023 Jul 28;20(2). doi: 10.1515/jib-2022-0055. eCollection 2023 Jun 1.
2
TRP-BERT: Discrimination of transient receptor potential (TRP) channels using contextual representations from deep bidirectional transformer based on BERT.TRP-BERT:基于 BERT 的深度双向转换器的上下文表示对瞬时受体电位 (TRP) 通道的判别。
Comput Biol Med. 2021 Oct;137:104821. doi: 10.1016/j.compbiomed.2021.104821. Epub 2021 Sep 1.
3
Exploiting protein language models for the precise classification of ion channels and ion transporters.利用蛋白质语言模型对离子通道和离子转运体进行精确分类。
Proteins. 2024 Aug;92(8):998-1055. doi: 10.1002/prot.26694. Epub 2024 Apr 24.
4
GT-Finder: Classify the family of glucose transporters with pre-trained BERT language models.GT-Finder:使用预训练的 BERT 语言模型对葡萄糖转运蛋白家族进行分类。
Comput Biol Med. 2021 Apr;131:104259. doi: 10.1016/j.compbiomed.2021.104259. Epub 2021 Feb 7.
5
Multi-Label Classification in Patient-Doctor Dialogues With the RoBERTa-WWM-ext + CNN (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach With Whole Word Masking Extended Combining a Convolutional Neural Network) Model: Named Entity Study.基于RoBERTa-WWM-ext + CNN(带有全词掩码扩展的基于变换器预训练方法的稳健优化双向编码器表示与卷积神经网络相结合)模型的医患对话多标签分类:命名实体研究
JMIR Med Inform. 2022 Apr 21;10(4):e35606. doi: 10.2196/35606.
6
ActTRANS: Functional classification in active transport proteins based on transfer learning and contextual representations.ActTRANS:基于迁移学习和上下文表示的主动转运蛋白的功能分类。
Comput Biol Chem. 2021 Aug;93:107537. doi: 10.1016/j.compbiolchem.2021.107537. Epub 2021 Jun 29.
7
A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information.基于 BERT 和二维卷积神经网络的变压器架构,用于从序列信息中识别 DNA 增强子。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab005.
8
TooT-T: discrimination of transport proteins from non-transport proteins.TooT-T:区分转运蛋白和非转运蛋白。
BMC Bioinformatics. 2020 Apr 23;21(Suppl 3):25. doi: 10.1186/s12859-019-3311-6.
9
Integrating Pre-Trained protein language model and multiple window scanning deep learning networks for accurate identification of secondary active transporters in membrane proteins.整合预训练蛋白质语言模型和多窗口扫描深度学习网络以准确识别膜蛋白中的次级主动转运体。
Methods. 2023 Dec;220:11-20. doi: 10.1016/j.ymeth.2023.10.008. Epub 2023 Oct 21.
10
CollagenTransformer: End-to-End Transformer Model to Predict Thermal Stability of Collagen Triple Helices Using an NLP Approach.胶原转换器:使用自然语言处理方法预测胶原三螺旋热稳定性的端到端转换器模型。
ACS Biomater Sci Eng. 2022 Oct 10;8(10):4301-4310. doi: 10.1021/acsbiomaterials.2c00737. Epub 2022 Sep 23.

引用本文的文献

1
Identifying the DNA methylation preference of transcription factors using ProtBERT and SVM.使用ProtBERT和支持向量机识别转录因子的DNA甲基化偏好性。
PLoS Comput Biol. 2025 May 13;21(5):e1012513. doi: 10.1371/journal.pcbi.1012513. eCollection 2025 May.
2
NA_mCNN: Classification of Sodium Transporters in Membrane Proteins by Integrating Multi-Window Deep Learning and ProtTrans for Their Therapeutic Potential.NA_mCNN:通过整合多窗口深度学习和ProtTrans对膜蛋白中的钠转运体进行分类以挖掘其治疗潜力
J Proteome Res. 2025 May 2;24(5):2324-2335. doi: 10.1021/acs.jproteome.4c00884. Epub 2025 Apr 7.
3
Ion channel classification through machine learning and protein language model embeddings.

本文引用的文献

1
De novo protein design with a language model.基于语言模型的从头蛋白质设计。
Nat Biotechnol. 2022 Oct;40(10):1433. doi: 10.1038/s41587-022-01518-5.
2
DistilProtBert: a distilled protein language model used to distinguish between real proteins and their randomly shuffled counterparts.DistilProtBert:一种经过蒸馏的蛋白质语言模型,用于区分真实蛋白质与其随机打乱的对应物。
Bioinformatics. 2022 Sep 16;38(Suppl_2):ii95-ii98. doi: 10.1093/bioinformatics/btac474.
3
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
通过机器学习和蛋白质语言模型嵌入进行离子通道分类
J Integr Bioinform. 2024 Nov 25;21(4). doi: 10.1515/jib-2023-0047. eCollection 2024 Dec 1.
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
4
ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning.ProtTrans:通过自监督学习理解生命语言。
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. doi: 10.1109/TPAMI.2021.3095381. Epub 2022 Sep 14.
5
Integrative approach for detecting membrane proteins.综合方法检测膜蛋白。
BMC Bioinformatics. 2020 Dec 21;21(Suppl 19):575. doi: 10.1186/s12859-020-03891-x.
6
TooT-T: discrimination of transport proteins from non-transport proteins.TooT-T:区分转运蛋白和非转运蛋白。
BMC Bioinformatics. 2020 Apr 23;21(Suppl 3):25. doi: 10.1186/s12859-019-3311-6.
7
The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.马修斯相关系数(MCC)在二分类评估中优于 F1 得分和准确率的优势。
BMC Genomics. 2020 Jan 2;21(1):6. doi: 10.1186/s12864-019-6413-7.
8
Logistic regression.逻辑回归
Transfusion. 2019 Jul;59(7):2197-2198. doi: 10.1111/trf.15406. Epub 2019 Jun 18.
9
Using word embedding technique to efficiently represent protein sequences for identifying substrate specificities of transporters.利用词嵌入技术有效地表示蛋白质序列,以识别转运蛋白的底物特异性。
Anal Biochem. 2019 Jul 15;577:73-81. doi: 10.1016/j.ab.2019.04.011. Epub 2019 Apr 22.
10
Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets.基于一般伪氨基酸组成(PseAAC)的各种模式,使用各种决策树分类器对不平衡数据集预测膜蛋白类型。
J Theor Biol. 2017 Dec 21;435:208-217. doi: 10.1016/j.jtbi.2017.09.018. Epub 2017 Sep 20.