• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

TemStaPro:使用蛋白质语言模型的序列表示进行蛋白质热稳定性预测。

TemStaPro: protein thermostability prediction using sequence representations from protein language models.

机构信息

Institute of Biotechnology, Life Sciences Center, Vilnius University, LT-10257 Vilnius, Lithuania.

Institute of Computer Science, Faculty of Mathematics and Informatics, Vilnius University, LT-08303 Vilnius, Lithuania.

出版信息

Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae157.

DOI:10.1093/bioinformatics/btae157
PMID:38507682
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11001493/
Abstract

MOTIVATION

Reliable prediction of protein thermostability from its sequence is valuable for both academic and industrial research. This prediction problem can be tackled using machine learning and by taking advantage of the recent blossoming of deep learning methods for sequence analysis. These methods can facilitate training on more data and, possibly, enable the development of more versatile thermostability predictors for multiple ranges of temperatures.

RESULTS

We applied the principle of transfer learning to predict protein thermostability using embeddings generated by protein language models (pLMs) from an input protein sequence. We used large pLMs that were pre-trained on hundreds of millions of known sequences. The embeddings from such models allowed us to efficiently train and validate a high-performing prediction method using over one million sequences that we collected from organisms with annotated growth temperatures. Our method, TemStaPro (Temperatures of Stability for Proteins), was used to predict thermostability of CRISPR-Cas Class II effector proteins (C2EPs). Predictions indicated sharp differences among groups of C2EPs in terms of thermostability and were largely in tune with previously published and our newly obtained experimental data.

AVAILABILITY AND IMPLEMENTATION

TemStaPro software and the related data are freely available from https://github.com/ievapudz/TemStaPro and https://doi.org/10.5281/zenodo.7743637.

摘要

动机

从蛋白质序列可靠地预测其热稳定性对于学术和工业研究都具有重要价值。这个预测问题可以通过机器学习来解决,并利用最近深度学习方法在序列分析方面的蓬勃发展。这些方法可以促进在更多数据上进行训练,并有可能为多个温度范围开发更通用的热稳定性预测器。

结果

我们应用迁移学习的原理,使用来自输入蛋白质序列的蛋白质语言模型 (pLM) 生成的嵌入来预测蛋白质的热稳定性。我们使用了经过数亿个已知序列预训练的大型 pLM。这些模型的嵌入使我们能够使用从具有注释生长温度的生物体中收集的超过 100 万个序列来高效地训练和验证高性能的预测方法。我们的方法 TemStaPro(蛋白质稳定性温度)用于预测 CRISPR-Cas 类 II 效应蛋白 (C2EP) 的热稳定性。预测表明,C2EP 群体之间在热稳定性方面存在明显差异,并且与先前发表的和我们新获得的实验数据基本一致。

可用性和实现

TemStaPro 软件和相关数据可从 https://github.com/ievapudz/TemStaPro 和 https://doi.org/10.5281/zenodo.7743637 免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15d4/11001493/3830ed1fb2c5/btae157f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15d4/11001493/9b6452b4a3af/btae157f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15d4/11001493/addd774ea9f6/btae157f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15d4/11001493/942e63f0aa71/btae157f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15d4/11001493/3830ed1fb2c5/btae157f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15d4/11001493/9b6452b4a3af/btae157f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15d4/11001493/addd774ea9f6/btae157f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15d4/11001493/942e63f0aa71/btae157f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15d4/11001493/3830ed1fb2c5/btae157f4.jpg

相似文献

1
TemStaPro: protein thermostability prediction using sequence representations from protein language models.TemStaPro:使用蛋白质语言模型的序列表示进行蛋白质热稳定性预测。
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae157.
2
Learned protein embeddings for machine learning.机器学习的深度学习蛋白质嵌入。
Bioinformatics. 2018 Aug 1;34(15):2642-2648. doi: 10.1093/bioinformatics/bty178.
3
LMCrot: an enhanced protein crotonylation site predictor by leveraging an interpretable window-level embedding from a transformer-based protein language model.LMCrot:一种基于转换器的蛋白质语言模型的可解释窗口级嵌入的增强型蛋白质巴豆酰化位点预测器。
Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae290.
4
TemBERTure: advancing protein thermostability prediction with deep learning and attention mechanisms.TemBERTure:利用深度学习和注意力机制推进蛋白质热稳定性预测
Bioinform Adv. 2024 Jul 13;4(1):vbae103. doi: 10.1093/bioadv/vbae103. eCollection 2024.
5
Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function.将蛋白质序列和结构与转换器和等变图神经网络相结合,以预测蛋白质功能。
Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i318-i325. doi: 10.1093/bioinformatics/btad208.
6
Embeddings from protein language models predict conservation and variant effects.基于蛋白质语言模型的嵌入模型可预测保守性和变异效应。
Hum Genet. 2022 Oct;141(10):1629-1647. doi: 10.1007/s00439-021-02411-y. Epub 2021 Dec 30.
7
Predicting thermostability difference between cellular protein orthologs.预测细胞蛋白直系同源物之间的热稳定性差异。
Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad504.
8
learnMSA2: deep protein multiple alignments with large language and hidden Markov models.learnMSA2:基于大型语言模型和隐马尔可夫模型的深度蛋白质多重比对。
Bioinformatics. 2024 Sep 1;40(Suppl 2):ii79-ii86. doi: 10.1093/bioinformatics/btae381.
9
Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features.利用序列和结构特征的机器学习方法实现抗体热稳定性的可推广预测。
MAbs. 2023 Jan-Dec;15(1):2163584. doi: 10.1080/19420862.2022.2163584.
10
Modeling aspects of the language of life through transfer-learning protein sequences.通过转移学习蛋白质序列来模拟生命语言的各个方面。
BMC Bioinformatics. 2019 Dec 17;20(1):723. doi: 10.1186/s12859-019-3220-8.

引用本文的文献

1
Kinetic analysis and engineering of thermostable Cas12a for nucleic acid detection.用于核酸检测的热稳定Cas12a的动力学分析与工程设计
Nucleic Acids Res. 2025 Jun 6;53(11). doi: 10.1093/nar/gkaf509.
2
Prediction and design of thermostable proteins with a desired melting temperature.具有所需解链温度的热稳定蛋白质的预测与设计。
Sci Rep. 2025 May 14;15(1):16683. doi: 10.1038/s41598-025-98667-9.
3
Advancing the Accuracy of Anti-MRSA Peptide Prediction Through Integrating Multi-Source Protein Language Models.通过整合多源蛋白质语言模型提高抗耐甲氧西林金黄色葡萄球菌肽预测的准确性

本文引用的文献

1
Superior protein thermophilicity prediction with protein language model embeddings.利用蛋白质语言模型嵌入实现卓越的蛋白质嗜热性预测。
NAR Genom Bioinform. 2023 Oct 11;5(4):lqad087. doi: 10.1093/nargab/lqad087. eCollection 2023 Dec.
2
TnpB structure reveals minimal functional core of Cas12 nuclease family.TnpB 结构揭示了 Cas12 核酸酶家族的最小功能核心。
Nature. 2023 Apr;616(7956):384-389. doi: 10.1038/s41586-023-05826-x. Epub 2023 Apr 5.
3
Evolutionary-scale prediction of atomic-level protein structure with a language model.
Interdiscip Sci. 2025 Mar 11. doi: 10.1007/s12539-025-00696-5.
4
Enhancing recombinant growth factor and serum protein production for cultivated meat manufacturing.提高用于培养肉生产的重组生长因子和血清蛋白产量。
Microb Cell Fact. 2025 Feb 16;24(1):41. doi: 10.1186/s12934-025-02670-8.
5
Enzyme-Embedded Biodegradable Plastic for Sustainable Applications: Advances, Challenges, and Perspectives.用于可持续应用的酶嵌入可生物降解塑料:进展、挑战与展望
ACS Appl Bio Mater. 2025 Mar 17;8(3):1785-1796. doi: 10.1021/acsabm.4c01628. Epub 2025 Feb 13.
6
TEMPRO: nanobody melting temperature estimation model using protein embeddings.TEMPRO:使用蛋白质嵌入的纳米体融解温度预估模型。
Sci Rep. 2024 Aug 17;14(1):19074. doi: 10.1038/s41598-024-70101-6.
7
Guiding questions to avoid data leakage in biological machine learning applications.指导问题以避免生物机器学习应用中的数据泄露。
Nat Methods. 2024 Aug;21(8):1444-1453. doi: 10.1038/s41592-024-02362-y. Epub 2024 Aug 9.
8
Biophysical cartography of the native and human-engineered antibody landscapes quantifies the plasticity of antibody developability.天然和人工改造抗体景观的生物物理作图定量评估了抗体可开发性的可塑性。
Commun Biol. 2024 Jul 31;7(1):922. doi: 10.1038/s42003-024-06561-3.
9
TemBERTure: advancing protein thermostability prediction with deep learning and attention mechanisms.TemBERTure:利用深度学习和注意力机制推进蛋白质热稳定性预测
Bioinform Adv. 2024 Jul 13;4(1):vbae103. doi: 10.1093/bioadv/vbae103. eCollection 2024.
10
Toward enhancement of antibody thermostability and affinity by computational design in the absence of antigen.通过无抗原的计算设计提高抗体的热稳定性和亲和力。
MAbs. 2024 Jan-Dec;16(1):2362775. doi: 10.1080/19420862.2024.2362775. Epub 2024 Jun 20.
用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
4
DeepTP: A Deep Learning Model for Thermophilic Protein Prediction.深度 TP:一种用于耐热蛋白预测的深度学习模型。
Int J Mol Sci. 2023 Jan 22;24(3):2217. doi: 10.3390/ijms24032217.
5
Light attention predicts protein location from the language of life.轻注意力从生命语言中预测蛋白质位置。
Bioinform Adv. 2021 Nov 19;1(1):vbab035. doi: 10.1093/bioadv/vbab035. eCollection 2021.
6
CRISPR-Based Diagnostics: Challenges and Potential Solutions toward Point-of-Care Applications.基于 CRISPR 的诊断:迈向即时检测应用的挑战和潜在解决方案。
ACS Synth Biol. 2023 Jan 20;12(1):1-16. doi: 10.1021/acssynbio.2c00496. Epub 2022 Dec 12.
7
MGnify: the microbiome sequence data analysis resource in 2023.MGnify:2023 年的微生物组序列数据分析资源。
Nucleic Acids Res. 2023 Jan 6;51(D1):D753-D759. doi: 10.1093/nar/gkac1080.
8
IMG/VR v4: an expanded database of uncultivated virus genomes within a framework of extensive functional, taxonomic, and ecological metadata.IMG/VR v4:一个扩展的未培养病毒基因组数据库,其中包含广泛的功能、分类和生态元数据框架。
Nucleic Acids Res. 2023 Jan 6;51(D1):D733-D743. doi: 10.1093/nar/gkac1037.
9
A new family of CRISPR-type V nucleases with C-rich PAM recognition.具有富含 C 的 PAM 识别基序的新型 CRISPR 型 V 型核酸酶家族。
EMBO Rep. 2022 Dec 6;23(12):e55481. doi: 10.15252/embr.202255481. Epub 2022 Oct 21.
10
Applications of CRISPR/Cas13-Based RNA Editing in Plants.基于 CRISPR/Cas13 的 RNA 编辑在植物中的应用。
Cells. 2022 Aug 27;11(17):2665. doi: 10.3390/cells11172665.