进化概率和堆叠回归可实现最小化实验投入的数据驱动蛋白质工程。

Evolutionary Probability and Stacked Regressions Enable Data-Driven Protein Engineering with Minimized Experimental Effort.

机构信息

Institute of Biotechnology, RWTH Aachen University, Worringerweg 3, 52074 Aachen, Germany.

Department of Bioorganic Chemistry, Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120 Halle, Germany.

出版信息

J Chem Inf Model. 2024 Aug 26;64(16):6350-6360. doi: 10.1021/acs.jcim.4c00704. Epub 2024 Aug 1.

DOI:10.1021/acs.jcim.4c00704

PMID:39088689

Abstract

Protein engineering through directed evolution and (semi)rational approaches is routinely applied to optimize protein properties for a broad range of applications in industry and academia. The multitude of possible variants, combined with limited screening throughput, hampers efficient protein engineering. Data-driven strategies have emerged as a powerful tool to model the protein fitness landscape that can be explored , significantly accelerating protein engineering campaigns. However, such methods require a certain amount of data, which often cannot be provided, to generate a reliable model of the fitness landscape. Here, we introduce MERGE, a method that combines direct coupling analysis (DCA) and machine learning (ML). MERGE enables data-driven protein engineering when only limited data are available for training, typically ranging from 50 to 500 labeled sequences. Our method demonstrates remarkable performance in predicting a protein's fitness value and rank based on its sequence across diverse proteins and properties. Notably, MERGE outperforms state-of-the-art methods when only small data sets are available for modeling, requiring fewer computational resources, and proving particularly promising for protein engineers who have access to limited amounts of data.

摘要

通过定向进化和（半）理性方法进行蛋白质工程，已被广泛应用于优化蛋白质特性，以满足工业和学术界的广泛需求。大量可能的变体与有限的筛选通量相结合，阻碍了有效的蛋白质工程。数据驱动的策略已成为建模蛋白质适应性景观的强大工具，可以探索该景观，从而显著加速蛋白质工程的开展。然而，这些方法需要一定数量的数据才能生成适应性景观的可靠模型，但通常无法提供。在这里，我们引入了 MERGE，一种结合直接耦合分析（DCA）和机器学习（ML）的方法。当只有有限的数据可用于训练时，MERGE 可以实现数据驱动的蛋白质工程，通常范围在 50 到 500 个标记序列之间。我们的方法在基于序列预测不同蛋白质和特性的蛋白质适应性值和排名方面表现出色。值得注意的是，当只有小的数据集可用于建模时，MERGE 优于最先进的方法，所需的计算资源更少，对于只能访问有限数量数据的蛋白质工程师来说尤其有前景。

相似文献

Evolutionary Probability and Stacked Regressions Enable Data-Driven Protein Engineering with Minimized Experimental Effort.进化概率和堆叠回归可实现最小化实验投入的数据驱动蛋白质工程。

J Chem Inf Model. 2024 Aug 26;64(16):6350-6360. doi: 10.1021/acs.jcim.4c00704. Epub 2024 Aug 1.

PyPEF-An Integrated Framework for Data-Driven Protein Engineering.PyPEF——一个用于数据驱动的蛋白质工程的集成框架。

J Chem Inf Model. 2021 Jul 26;61(7):3463-3476. doi: 10.1021/acs.jcim.1c00099. Epub 2021 Jul 14.

Machine learning to navigate fitness landscapes for protein engineering.机器学习在蛋白质工程中的应用：探索适应度景观

Curr Opin Biotechnol. 2022 Jun;75:102713. doi: 10.1016/j.copbio.2022.102713. Epub 2022 Apr 9.

Machine-learning-guided directed evolution for protein engineering.基于机器学习的定向进化蛋白质工程。

Nat Methods. 2019 Aug;16(8):687-694. doi: 10.1038/s41592-019-0496-6. Epub 2019 Jul 15.

Machine learning to predict continuous protein properties from binary cell sorting data and map unseen sequence space.机器学习预测连续蛋白质特性从二进制细胞排序数据和映射未见序列空间。

Proc Natl Acad Sci U S A. 2024 Mar 12;121(11):e2311726121. doi: 10.1073/pnas.2311726121. Epub 2024 Mar 7.

Machine learning-assisted enzyme engineering.机器学习辅助酶工程。

Methods Enzymol. 2020;643:281-315. doi: 10.1016/bs.mie.2020.05.005. Epub 2020 Jun 12.

Learning Strategies in Protein Directed Evolution.蛋白质定向进化中的学习策略。

Methods Mol Biol. 2022;2461:225-275. doi: 10.1007/978-1-0716-2152-3_15.

CLADE 2.0: Evolution-Driven Cluster Learning-Assisted Directed Evolution.CLADE 2.0：进化驱动的聚类学习辅助定向进化

J Chem Inf Model. 2022 Oct 10;62(19):4629-4641. doi: 10.1021/acs.jcim.2c01046. Epub 2022 Sep 26.

Computational Protein Design - Where it goes?计算蛋白质设计——未来走向何方？

Curr Med Chem. 2024;31(20):2841-2854. doi: 10.2174/0929867330666230602143700.

Data-Driven Protein Engineering for Improving Catalytic Activity and Selectivity.基于数据的蛋白质工程提高催化活性和选择性。

Chembiochem. 2024 Feb 1;25(3):e202300754. doi: 10.1002/cbic.202300754. Epub 2023 Dec 11.

引用本文的文献

Revolutionizing Molecular Design for Innovative Therapeutic Applications through Artificial Intelligence.通过人工智能为创新治疗应用彻底改变分子设计。

Molecules. 2024 Sep 29;29(19):4626. doi: 10.3390/molecules29194626.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

进化概率和堆叠回归可实现最小化实验投入的数据驱动蛋白质工程。

Evolutionary Probability and Stacked Regressions Enable Data-Driven Protein Engineering with Minimized Experimental Effort.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献