大语言模型在最佳 pH 酶预测方面的应用。

Approaching Optimal pH Enzyme Prediction with Large Language Models.

机构信息

Tetra D AG, Shaffhausen 8200, Switzerland.

Constructor University Bremen gGmbH, Bremen 28759, Germany.

出版信息

ACS Synth Biol. 2024 Sep 20;13(9):3013-3021. doi: 10.1021/acssynbio.4c00465. Epub 2024 Aug 28.

DOI:10.1021/acssynbio.4c00465

PMID:39197156

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11421216/

Abstract

Enzymes are widely used in biotechnology due to their ability to catalyze chemical reactions: food making, laundry, pharmaceutics, textile, brewing─all these areas benefit from utilizing various enzymes. Proton concentration (pH) is one of the key factors that define the enzyme functioning and efficiency. Usually there is only a narrow range of pH values where the enzyme is active. This is a common problem in biotechnology to design an enzyme with optimal activity in a given pH range. A large part of this task can be completed , by predicting the optimal pH of designed candidates. The success of such computational methods critically depends on the available data. In this study, we developed a language-model-based approach to predict the optimal pH range from the enzyme sequence. We used different splitting strategies based on sequence similarity, protein family annotation, and enzyme classification to validate the robustness of the proposed approach. The derived machine-learning models demonstrated high accuracy across proteins from different protein families and proteins with lower sequence similarities compared with the training set. The proposed method is fast enough for the high-throughput virtual exploration of protein space for the search for sequences with desired optimal pH levels.

摘要

由于能够催化化学反应，酶在生物技术中得到了广泛应用：食品制造、洗衣、制药、纺织、酿造——所有这些领域都受益于各种酶的应用。质子浓度（pH 值）是定义酶功能和效率的关键因素之一。通常，只有在酶活性较高的狭窄 pH 值范围内。在生物技术中，设计在给定 pH 值范围内具有最佳活性的酶是一个常见问题。通过预测设计候选物的最佳 pH 值，可以完成此任务的很大一部分。此类计算方法的成功与否在很大程度上取决于可用数据。在这项研究中，我们开发了一种基于语言模型的方法，可根据酶序列预测最佳 pH 值范围。我们使用了基于序列相似性、蛋白质家族注释和酶分类的不同拆分策略来验证所提出方法的稳健性。与训练集相比，所得到的机器学习模型在来自不同蛋白质家族的蛋白质和具有较低序列相似性的蛋白质中表现出了较高的准确性。所提出的方法速度足够快，可用于高通量虚拟探索蛋白质空间，以寻找具有所需最佳 pH 值的序列。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2904/11421216/2f21c2fd5086/sb4c00465_0001.jpg

相似文献

Approaching Optimal pH Enzyme Prediction with Large Language Models.大语言模型在最佳 pH 酶预测方面的应用。

ACS Synth Biol. 2024 Sep 20;13(9):3013-3021. doi: 10.1021/acssynbio.4c00465. Epub 2024 Aug 28.

A Deep Retrieval-Enhanced Meta-Learning Framework for Enzyme Optimum pH Prediction.一种用于酶最佳pH预测的深度检索增强元学习框架。

J Chem Inf Model. 2025 Apr 14;65(7):3761-3770. doi: 10.1021/acs.jcim.4c02291. Epub 2025 Mar 24.

deepNEC: a novel alignment-free tool for the identification and classification of nitrogen biochemical network-related enzymes using deep learning.深度 NEC：一种新颖的无对齐工具，用于使用深度学习识别和分类与氮生化网络相关的酶。

Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac071.

Seq2Topt: a sequence-based deep learning predictor of enzyme optimal temperature.Seq2Topt：一种基于序列的酶最佳温度深度学习预测器。

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf114.

Comparative Assessment of Protein Large Language Models for Enzyme Commission Number Prediction.用于酶委员会编号预测的蛋白质大语言模型的比较评估

BMC Bioinformatics. 2025 Feb 27;26(1):68. doi: 10.1186/s12859-025-06081-9.

In silico prediction of potential chemical reactions mediated by human enzymes.基于人类酶介导的潜在化学反应的计算预测。

BMC Bioinformatics. 2018 Jun 13;19(Suppl 8):207. doi: 10.1186/s12859-018-2194-2.

[AcidBasePred: a protein acid-base tolerance prediction platform based on deep learning].[酸碱预测：基于深度学习的蛋白质酸碱耐受性预测平台]

Sheng Wu Gong Cheng Xue Bao. 2024 Dec 25;40(12):4670-4681. doi: 10.13345/j.cjb.240255.

Accurately predicting enzyme functions through geometric graph learning on ESMFold-predicted structures.通过在 ESMFold 预测结构上进行几何图形学习，准确预测酶功能。

Nat Commun. 2024 Sep 18;15(1):8180. doi: 10.1038/s41467-024-52533-w.

EnzDP: improved enzyme annotation for metabolic network reconstruction based on domain composition profiles.EnzDP：基于结构域组成概况改进代谢网络重建的酶注释

J Bioinform Comput Biol. 2015 Oct;13(5):1543003. doi: 10.1142/S0219720015430039.

AcalPred: a sequence-based tool for discriminating between acidic and alkaline enzymes.AcalPred：一种基于序列的区分酸酶和碱酶的工具。

PLoS One. 2013 Oct 9;8(10):e75726. doi: 10.1371/journal.pone.0075726. eCollection 2013.

引用本文的文献

A Deep Retrieval-Enhanced Meta-Learning Framework for Enzyme Optimum pH Prediction.一种用于酶最佳pH预测的深度检索增强元学习框架。

J Chem Inf Model. 2025 Apr 14;65(7):3761-3770. doi: 10.1021/acs.jcim.4c02291. Epub 2025 Mar 24.

本文引用的文献

Resolving coupled pH titrations using alchemical free energy calculations.使用炼金术自由能计算解析耦合pH滴定

J Comput Chem. 2024 Jun 30;45(17):1444-1455. doi: 10.1002/jcc.27318. Epub 2024 Mar 12.

Comparative Performance of High-Throughput Methods for Protein p Predictions.高通量方法进行蛋白质 p 预测的性能比较。

J Chem Inf Model. 2023 Aug 28;63(16):5169-5181. doi: 10.1021/acs.jcim.3c00165. Epub 2023 Aug 7.

Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。

Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.

UniProt: the Universal Protein Knowledgebase in 2023.UniProt：2023 年的通用蛋白质知识库。

Nucleic Acids Res. 2023 Jan 6;51(D1):D523-D531. doi: 10.1093/nar/gkac1052.

Twenty-five years of Genomes OnLine Database (GOLD): data updates and new features in v.9.25 年的基因组在线数据库（GOLD）：v.9 中的数据更新和新功能。

Nucleic Acids Res. 2023 Jan 6;51(D1):D957-D963. doi: 10.1093/nar/gkac974.

Computed structures of core eukaryotic protein complexes.核心真核蛋白复合物的计算结构。

Science. 2021 Dec 10;374(6573):eabm4805. doi: 10.1126/science.abm4805.

Accurate prediction of protein structures and interactions using a three-track neural network.使用三轨神经网络准确预测蛋白质结构和相互作用。

Science. 2021 Aug 20;373(6557):871-876. doi: 10.1126/science.abj8754. Epub 2021 Jul 15.

Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。

Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.

ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning.ProtTrans：通过自监督学习理解生命语言。

IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. doi: 10.1109/TPAMI.2021.3095381. Epub 2022 Sep 14.

Learned Embeddings from Deep Learning to Visualize and Predict Protein Sets.深度学习提取的学习特征可用于可视化和预测蛋白质组。

Curr Protoc. 2021 May;1(5):e113. doi: 10.1002/cpz1.113.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

大语言模型在最佳 pH 酶预测方面的应用。

Approaching Optimal pH Enzyme Prediction with Large Language Models.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献