• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PTSP-BERT:利用来自Transformer嵌入特征的基于序列的双向表示预测蛋白质的热稳定性。

PTSP-BERT: Predict the thermal stability of proteins using sequence-based bidirectional representations from transformer-embedded features.

作者信息

Lv Zhibin, Wei Mingxuan, Pei Hongdi, Peng Shiyu, Li Mingxin, Jiang Liangzhen

机构信息

College of Biomedical Engineering, Sichuan University, Chengdu, 610065, China.

College of Biomedical Engineering, Sichuan University, Chengdu, 610065, China.

出版信息

Comput Biol Med. 2025 Feb;185:109598. doi: 10.1016/j.compbiomed.2024.109598. Epub 2024 Dec 20.

DOI:10.1016/j.compbiomed.2024.109598
PMID:39708499
Abstract

Thermophilic proteins, mesophiles proteins and psychrophilic proteins have wide industrial applications, as enzymes with different optimal temperatures are often needed for different purposes. Convenient methods are needed to determine the optimal temperatures for proteins; however, laboratory methods for this purpose are time-consuming and laborious, and existing machine learning methods can only perform binary classification of thermophilic and non-thermophilic proteins, or psychrophilic and non-psychrophilic proteins. Here, we developed a deep learning model, PSTP-BERT, based on protein sequences that can directly perform Three classes identification of thermophilic, mesophilic, and psychrophilic proteins. By comparing BERT-bfd with other deep learning models using five-fold cross-validation, we found that BERT-bfd-extracted features achieved the highest accuracy under six classifiers. Furthermore, to improve the model's accuracy, we used SMOTE (synthetic minority oversampling technique) to balance the dataset and light gradient-boosting machine to rank BERT-bfd-extracted features according to their weights. We obtained the best-performing model with five-fold cross-validation accuracy of 89.59 % and independent test accuracy of 85.42 %. The performance of the PSTP-BERT is significantly better than that of existing models in Three classes identification task. In order to compare with previous binary classification models, we used PSTP-BERT to perform binary classification tasks of thermophilic and non-thermophilic protein, and psychrophilic and non-psychrophilic protein on an independent test set. PSTP-BERT achieved the highest accuracy on both binary classification tasks, with an accuracy of 93.33 % for thermophilic protein binary classification and 88.33 % for psychrophilic protein binary classification. The accuracy of the independent test of the model can reach between 89.8 % and 92.9 % after training and optimization of the training set with different sequence similarities, and the prediction accuracy of the new data can exceed 97 %. For the convenience of future researchers to use and reference, we have uploaded source code of PSTP-BERT to GitHub.

摘要

嗜热蛋白、嗜温蛋白和嗜冷蛋白具有广泛的工业应用,因为不同的目的通常需要具有不同最适温度的酶。需要简便的方法来确定蛋白质的最适温度;然而,为此目的的实验室方法既耗时又费力,并且现有的机器学习方法只能对嗜热蛋白和非嗜热蛋白,或嗜冷蛋白和非嗜冷蛋白进行二元分类。在此,我们基于蛋白质序列开发了一种深度学习模型PSTP-BERT,它可以直接对嗜热、嗜温和嗜冷蛋白进行三类识别。通过使用五折交叉验证将BERT-bfd与其他深度学习模型进行比较,我们发现BERT-bfd提取的特征在六个分类器下达到了最高准确率。此外,为了提高模型的准确率,我们使用SMOTE(合成少数过采样技术)来平衡数据集,并使用轻梯度提升机根据其权重对BERT-bfd提取的特征进行排序。我们获得了性能最佳的模型,其五折交叉验证准确率为89.59%,独立测试准确率为85.42%。在三类识别任务中,PSTP-BERT的性能明显优于现有模型。为了与先前的二元分类模型进行比较,我们使用PSTP-BERT在独立测试集上执行嗜热蛋白和非嗜热蛋白,以及嗜冷蛋白和非嗜冷蛋白的二元分类任务。PSTP-BERT在这两个二元分类任务上均达到了最高准确率,嗜热蛋白二元分类的准确率为93.33%,嗜冷蛋白二元分类的准确率为88.33%。在用不同序列相似性的训练集进行训练和优化后,模型独立测试的准确率可以达到89.8%至92.9%之间,新数据的预测准确率可以超过97%。为了方便未来的研究人员使用和参考,我们已将PSTP-BERT的源代码上传到GitHub。

相似文献

1
PTSP-BERT: Predict the thermal stability of proteins using sequence-based bidirectional representations from transformer-embedded features.PTSP-BERT:利用来自Transformer嵌入特征的基于序列的双向表示预测蛋白质的热稳定性。
Comput Biol Med. 2025 Feb;185:109598. doi: 10.1016/j.compbiomed.2024.109598. Epub 2024 Dec 20.
2
Positional embeddings and zero-shot learning using BERT for molecular-property prediction.使用BERT进行位置嵌入和零样本学习以预测分子性质
J Cheminform. 2025 Feb 5;17(1):17. doi: 10.1186/s13321-025-00959-9.
3
IUP-BERT: Identification of Umami Peptides Based on BERT Features.IUP-BERT:基于BERT特征的鲜味肽识别
Foods. 2022 Nov 21;11(22):3742. doi: 10.3390/foods11223742.
4
Comparison of an Ensemble of Machine Learning Models and the BERT Language Model for Analysis of Text Descriptions of Brain CT Reports to Determine the Presence of Intracranial Hemorrhage.基于机器学习模型集成与 BERT 语言模型的脑 CT 报告文本描述分析用于判断颅内出血的比较研究
Sovrem Tekhnologii Med. 2024;16(1):27-34. doi: 10.17691/stm2024.16.1.03. Epub 2024 Feb 28.
5
Deep-ProBind: binding protein prediction with transformer-based deep learning model.深度蛋白质结合预测:基于Transformer的深度学习模型进行结合蛋白预测。
BMC Bioinformatics. 2025 Mar 22;26(1):88. doi: 10.1186/s12859-025-06101-8.
6
BERT-AmPEP60: A BERT-Based Transfer Learning Approach to Predict the Minimum Inhibitory Concentrations of Antimicrobial Peptides for and .BERT-AmPEP60:一种基于BERT的迁移学习方法,用于预测抗菌肽对……和……的最小抑菌浓度
J Chem Inf Model. 2025 Apr 14;65(7):3186-3202. doi: 10.1021/acs.jcim.4c01749. Epub 2025 Mar 14.
7
BERT-Kcr: prediction of lysine crotonylation sites by a transfer learning method with pre-trained BERT models.BERT-Kcr:基于预训练BERT模型的迁移学习方法预测赖氨酸巴豆酰化位点
Bioinformatics. 2022 Jan 12;38(3):648-654. doi: 10.1093/bioinformatics/btab712.
8
A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance.深度学习模型在不同类别不平衡程度的非结构化医疗记录文本分类中的对比研究。
BMC Med Res Methodol. 2022 Jul 2;22(1):181. doi: 10.1186/s12874-022-01665-y.
9
al-BERT: a semi-supervised denoising technique for disease prediction.al-BERT:一种用于疾病预测的半监督去噪技术。
BMC Med Inform Decis Mak. 2024 May 16;24(1):127. doi: 10.1186/s12911-024-02528-w.
10
Comparing Pre-trained and Feature-Based Models for Prediction of Alzheimer's Disease Based on Speech.基于语音比较预训练模型和基于特征的模型对阿尔茨海默病的预测
Front Aging Neurosci. 2021 Apr 27;13:635945. doi: 10.3389/fnagi.2021.635945. eCollection 2021.