Suppr超能文献

聚类学习辅助的定向进化

Cluster learning-assisted directed evolution.

作者信息

Qiu Yuchi, Hu Jian, Wei Guo-Wei

机构信息

Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA.

Department of Chemistry, Michigan State University, MI, 48824, USA.

出版信息

Nat Comput Sci. 2021 Dec;1(12):809-818. doi: 10.1038/s43588-021-00168-y. Epub 2021 Dec 9.

Abstract

Directed evolution, a strategy for protein engineering, optimizes protein properties (i.e., fitness) by expensive and time-consuming screening or selection of large mutational sequence space. Machine learning-assisted directed evolution (MLDE), which screens sequence properties , can accelerate the optimization and reduce the experimental burden. This work introduces a MLDE framework, cluster learning-assisted directed evolution (CLADE), that combines hierarchical unsupervised clustering sampling and supervised learning to guide protein engineering. The clustering sampling selectively picks and screens variants in targeted subspaces, which guides the subsequent generation of diverse training sets. In the last stage, accurate predictions via supervised learning models improve final outcomes. By sequentially screening 480 sequences out of 160,000 in a four-site combinatorial library with five equal experimental batches, CLADE achieves the global maximal fitness hit rate up to 91.0% and 34.0% for GB1 and PhoQ datasets, respectively, improved from 18.6% and 7.2% obtained by random-sampling-based MLDE.

摘要

定向进化是一种蛋白质工程策略,通过对庞大的突变序列空间进行昂贵且耗时的筛选或选择来优化蛋白质特性(即适应性)。机器学习辅助定向进化(MLDE)通过筛选序列特性,可以加速优化过程并减轻实验负担。这项工作引入了一个MLDE框架,即聚类学习辅助定向进化(CLADE),它将分层无监督聚类采样和监督学习相结合以指导蛋白质工程。聚类采样在目标子空间中选择性地挑选和筛选变体,这指导了后续多样化训练集的生成。在最后阶段,通过监督学习模型进行的准确预测可改善最终结果。通过在一个具有五个相等实验批次的四点组合文库中从160,000个序列中依次筛选出480个序列,CLADE在GB1和PhoQ数据集上分别实现了高达91.0%和34.0%的全局最大适应性命中率,相比基于随机采样的MLDE所获得的18.6%和7.2%有所提高。

相似文献

1
Cluster learning-assisted directed evolution.聚类学习辅助的定向进化
Nat Comput Sci. 2021 Dec;1(12):809-818. doi: 10.1038/s43588-021-00168-y. Epub 2021 Dec 9.
2
CLADE 2.0: Evolution-Driven Cluster Learning-Assisted Directed Evolution.CLADE 2.0:进化驱动的聚类学习辅助定向进化
J Chem Inf Model. 2022 Oct 10;62(19):4629-4641. doi: 10.1021/acs.jcim.2c01046. Epub 2022 Sep 26.
4
Machine learning-assisted directed protein evolution with combinatorial libraries.机器学习辅助的组合文库定向蛋白质进化。
Proc Natl Acad Sci U S A. 2019 Apr 30;116(18):8852-8858. doi: 10.1073/pnas.1901979116. Epub 2019 Apr 12.
6
PyPEF-An Integrated Framework for Data-Driven Protein Engineering.PyPEF——一个用于数据驱动的蛋白质工程的集成框架。
J Chem Inf Model. 2021 Jul 26;61(7):3463-3476. doi: 10.1021/acs.jcim.1c00099. Epub 2021 Jul 14.

引用本文的文献

2
A review of transformer models in drug discovery and beyond.药物发现及其他领域中变压器模型综述。
J Pharm Anal. 2025 Jun;15(6):101081. doi: 10.1016/j.jpha.2024.101081. Epub 2024 Aug 30.
3
Do protein language models learn phylogeny?蛋白质语言模型能学习系统发育吗?
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf047.
5
Active learning-assisted directed evolution.主动学习辅助的定向进化
Nat Commun. 2025 Jan 16;16(1):714. doi: 10.1038/s41467-025-55987-8.
7
Multi-Modal CLIP-Informed Protein Editing.多模态CLIP引导的蛋白质编辑
Health Data Sci. 2024 Dec 19;4:0211. doi: 10.34133/hds.0211. eCollection 2024.
8
A combinatorially complete epistatic fitness landscape in an enzyme active site.酶活性位点中的组合完全上位适合度景观。
Proc Natl Acad Sci U S A. 2024 Aug 6;121(32):e2400439121. doi: 10.1073/pnas.2400439121. Epub 2024 Jul 29.
10
Engineering Enzymes for Environmental Sustainability.为实现环境可持续性而设计的酶
Angew Chem Weinheim Bergstr Ger. 2023 Dec 21;135(52):e202309305. doi: 10.1002/ange.202309305. Epub 2023 Oct 5.

本文引用的文献

1
Adaptive machine learning for protein engineering.自适应机器学习在蛋白质工程中的应用。
Curr Opin Struct Biol. 2022 Feb;72:145-152. doi: 10.1016/j.sbi.2021.11.002. Epub 2021 Dec 9.
5
Advances in machine learning for directed evolution.机器学习在定向进化中的进展。
Curr Opin Struct Biol. 2021 Aug;69:11-18. doi: 10.1016/j.sbi.2021.01.008. Epub 2021 Feb 26.
9
Machine learning-assisted enzyme engineering.机器学习辅助酶工程。
Methods Enzymol. 2020;643:281-315. doi: 10.1016/bs.mie.2020.05.005. Epub 2020 Jun 12.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验