Suppr超能文献

大语言模型在最佳 pH 酶预测方面的应用。

Approaching Optimal pH Enzyme Prediction with Large Language Models.

机构信息

Tetra D AG, Shaffhausen 8200, Switzerland.

Constructor University Bremen gGmbH, Bremen 28759, Germany.

出版信息

ACS Synth Biol. 2024 Sep 20;13(9):3013-3021. doi: 10.1021/acssynbio.4c00465. Epub 2024 Aug 28.

Abstract

Enzymes are widely used in biotechnology due to their ability to catalyze chemical reactions: food making, laundry, pharmaceutics, textile, brewing─all these areas benefit from utilizing various enzymes. Proton concentration (pH) is one of the key factors that define the enzyme functioning and efficiency. Usually there is only a narrow range of pH values where the enzyme is active. This is a common problem in biotechnology to design an enzyme with optimal activity in a given pH range. A large part of this task can be completed , by predicting the optimal pH of designed candidates. The success of such computational methods critically depends on the available data. In this study, we developed a language-model-based approach to predict the optimal pH range from the enzyme sequence. We used different splitting strategies based on sequence similarity, protein family annotation, and enzyme classification to validate the robustness of the proposed approach. The derived machine-learning models demonstrated high accuracy across proteins from different protein families and proteins with lower sequence similarities compared with the training set. The proposed method is fast enough for the high-throughput virtual exploration of protein space for the search for sequences with desired optimal pH levels.

摘要

由于能够催化化学反应,酶在生物技术中得到了广泛应用:食品制造、洗衣、制药、纺织、酿造——所有这些领域都受益于各种酶的应用。质子浓度(pH 值)是定义酶功能和效率的关键因素之一。通常,只有在酶活性较高的狭窄 pH 值范围内。在生物技术中,设计在给定 pH 值范围内具有最佳活性的酶是一个常见问题。通过预测设计候选物的最佳 pH 值,可以完成此任务的很大一部分。此类计算方法的成功与否在很大程度上取决于可用数据。在这项研究中,我们开发了一种基于语言模型的方法,可根据酶序列预测最佳 pH 值范围。我们使用了基于序列相似性、蛋白质家族注释和酶分类的不同拆分策略来验证所提出方法的稳健性。与训练集相比,所得到的机器学习模型在来自不同蛋白质家族的蛋白质和具有较低序列相似性的蛋白质中表现出了较高的准确性。所提出的方法速度足够快,可用于高通量虚拟探索蛋白质空间,以寻找具有所需最佳 pH 值的序列。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2904/11421216/2f21c2fd5086/sb4c00465_0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验