• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

整合遗传算法与语言模型以优化酶设计

Integrating genetic algorithms and language models for enhanced enzyme design.

作者信息

Nana Teukam Yves Gaetan, Zipoli Federico, Laino Teodoro, Criscuolo Emanuele, Grisoni Francesca, Manica Matteo

机构信息

IBM Research Europe, Säumerstrasse 4, CH-8803 Rüschlikon, Switzerland.

Institute for Complex Molecular Systems and Department of Biomedical Engineering, Eindhoven University of Technology, 5612 AZ Eindhoven, the Netherlands.

出版信息

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae675.

DOI:10.1093/bib/bbae675
PMID:39780486
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11711099/
Abstract

Enzymes are molecular machines optimized by nature to allow otherwise impossible chemical processes to occur. Their design is a challenging task due to the complexity of the protein space and the intricate relationships between sequence, structure, and function. Recently, large language models (LLMs) have emerged as powerful tools for modeling and analyzing biological sequences, but their application to protein design is limited by the high cardinality of the protein space. This study introduces a framework that combines LLMs with genetic algorithms (GAs) to optimize enzymes. LLMs are trained on a large dataset of protein sequences to learn relationships between amino acid residues linked to structure and function. This knowledge is then leveraged by GAs to efficiently search for sequences with improved catalytic performance. We focused on two optimization tasks: improving the feasibility of biochemical reactions and increasing their turnover rate. Systematic evaluations on 105 biocatalytic reactions demonstrated that the LLM-GA framework generated mutants outperforming the wild-type enzymes in terms of feasibility in 90% of the instances. Further in-depth evaluation of seven reactions reveals the power of this methodology to make "the best of both worlds" and create mutants with structural features and flexibility comparable with the wild types. Our approach advances the state-of-the-art computational design of biocatalysts, ultimately opening opportunities for more sustainable chemical processes.

摘要

酶是自然界优化的分子机器,能使原本不可能发生的化学过程得以发生。由于蛋白质空间的复杂性以及序列、结构和功能之间的复杂关系,酶的设计是一项具有挑战性的任务。最近,大语言模型(LLMs)已成为建模和分析生物序列的强大工具,但其在蛋白质设计中的应用受到蛋白质空间高基数的限制。本研究引入了一个将大语言模型与遗传算法(GAs)相结合以优化酶的框架。大语言模型在一个大型蛋白质序列数据集上进行训练,以学习与结构和功能相关的氨基酸残基之间的关系。然后,遗传算法利用这些知识有效地搜索具有改进催化性能的序列。我们专注于两项优化任务:提高生化反应的可行性和提高其周转速率。对105个生物催化反应的系统评估表明,在90%的情况下,大语言模型-遗传算法框架生成的突变体在可行性方面优于野生型酶。对七个反应的进一步深入评估揭示了这种方法“两全其美”的能力,并创造出具有与野生型相当的结构特征和灵活性的突变体。我们的方法推动了生物催化剂计算设计的最新技术水平,最终为更可持续的化学过程开辟了机会。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4745/11711099/e16b97f1a6c8/bbae675f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4745/11711099/5008fc9ec50c/bbae675f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4745/11711099/f397d23ec256/bbae675f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4745/11711099/72338ceb5f44/bbae675f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4745/11711099/72bc63c1d695/bbae675f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4745/11711099/e16b97f1a6c8/bbae675f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4745/11711099/5008fc9ec50c/bbae675f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4745/11711099/f397d23ec256/bbae675f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4745/11711099/72338ceb5f44/bbae675f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4745/11711099/72bc63c1d695/bbae675f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4745/11711099/e16b97f1a6c8/bbae675f5.jpg

相似文献

1
Integrating genetic algorithms and language models for enhanced enzyme design.整合遗传算法与语言模型以优化酶设计
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae675.
2
Hybrid schemes based on quantum mechanics/molecular mechanics simulations goals to success, problems, and perspectives.基于量子力学/分子力学模拟的混合方案的目标、问题和展望。
Adv Protein Chem Struct Biol. 2011;85:81-142. doi: 10.1016/B978-0-12-386485-7.00003-X.
3
Computational enzyme design: transitioning from catalytic proteins to enzymes.计算酶设计:从催化蛋白到酶的转变
Curr Opin Struct Biol. 2014 Aug;27:87-94. doi: 10.1016/j.sbi.2014.05.010. Epub 2014 Jul 5.
4
Computational tools for the evaluation of laboratory-engineered biocatalysts.用于评估实验室工程化生物催化剂的计算工具。
Chem Commun (Camb). 2016 Dec 22;53(2):284-297. doi: 10.1039/c6cc06055b.
5
De novo enzymes by computational design.通过计算设计的从头酶。
Curr Opin Chem Biol. 2013 Apr;17(2):221-8. doi: 10.1016/j.cbpa.2013.02.012. Epub 2013 Mar 14.
6
A Practical Guide to Computational Tools for Engineering Biocatalytic Properties.工程生物催化特性计算工具实用指南
Int J Mol Sci. 2025 Jan 24;26(3):980. doi: 10.3390/ijms26030980.
7
Computational scoring and experimental evaluation of enzymes generated by neural networks.神经网络生成的酶的计算评分与实验评估
Nat Biotechnol. 2025 Mar;43(3):396-405. doi: 10.1038/s41587-024-02214-2. Epub 2024 Apr 23.
8
Robust enzyme design: bioinformatic tools for improved protein stability.强大的酶设计:用于提高蛋白质稳定性的生物信息学工具
Biotechnol J. 2015 Mar;10(3):344-55. doi: 10.1002/biot.201400150. Epub 2014 Dec 19.
9
Strategies for the discovery and engineering of enzymes for biocatalysis.生物催化用酶的发现和工程策略。
Curr Opin Chem Biol. 2013 Apr;17(2):215-20. doi: 10.1016/j.cbpa.2013.02.022. Epub 2013 Mar 21.
10
Comparative Assessment of Protein Large Language Models for Enzyme Commission Number Prediction.用于酶委员会编号预测的蛋白质大语言模型的比较评估
BMC Bioinformatics. 2025 Feb 27;26(1):68. doi: 10.1186/s12859-025-06081-9.

引用本文的文献

1
FGeneBERT: function-driven pre-trained gene language model for metagenomics.FGeneBERT:用于宏基因组学的功能驱动型预训练基因语言模型
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf149.

本文引用的文献

1
Turnover number predictions for kinetically uncharacterized enzymes using machine and deep learning.使用机器学习和深度学习预测动力学特征未知的酶的周转率。
Nat Commun. 2023 Jul 12;14(1):4139. doi: 10.1038/s41467-023-39840-4.
2
Leveraging transformers-based language models in proteome bioinformatics.基于转换器的语言模型在蛋白质组生物信息学中的应用。
Proteomics. 2023 Dec;23(23-24):e2300011. doi: 10.1002/pmic.202300011. Epub 2023 Jun 29.
3
Adaptive language model training for molecular design.用于分子设计的自适应语言模型训练
J Cheminform. 2023 Jun 8;15(1):59. doi: 10.1186/s13321-023-00719-7.
4
Large language models generate functional protein sequences across diverse families.大型语言模型可生成不同家族的功能性蛋白质序列。
Nat Biotechnol. 2023 Aug;41(8):1099-1106. doi: 10.1038/s41587-022-01618-2. Epub 2023 Jan 26.
5
GAMaterial-A genetic-algorithm software for material design and discovery.GAMaterial-一款用于材料设计和发现的遗传算法软件。
J Comput Chem. 2023 Mar 15;44(7):814-823. doi: 10.1002/jcc.27043. Epub 2022 Nov 29.
6
PubChem 2023 update.PubChem 2023 更新。
Nucleic Acids Res. 2023 Jan 6;51(D1):D1373-D1380. doi: 10.1093/nar/gkac956.
7
ProtGPT2 is a deep unsupervised language model for protein design.ProtGPT2 是一个用于蛋白质设计的深度无监督语言模型。
Nat Commun. 2022 Jul 27;13(1):4348. doi: 10.1038/s41467-022-32007-7.
8
AI-Based Protein Structure Prediction in Drug Discovery: Impacts and Challenges.基于人工智能的药物发现中的蛋白质结构预测:影响和挑战。
J Chem Inf Model. 2022 Jul 11;62(13):3142-3156. doi: 10.1021/acs.jcim.2c00026. Epub 2022 Jun 21.
9
Biocatalysed synthesis planning using data-driven learning.基于数据驱动学习的生物催化合成规划。
Nat Commun. 2022 Feb 18;13(1):964. doi: 10.1038/s41467-022-28536-w.
10
ProteinBERT: a universal deep-learning model of protein sequence and function.蛋白质 BERT:一种通用的蛋白质序列和功能深度学习模型。
Bioinformatics. 2022 Apr 12;38(8):2102-2110. doi: 10.1093/bioinformatics/btac020.