LevSeq：用于定向进化和机器学习的序列-功能数据的快速生成

LevSeq: Rapid Generation of Sequence-Function Data for Directed Evolution and Machine Learning.

作者信息

Long Yueming, Mora Ariane, Li Francesca-Zhoufan, Gürsoy Emre, Johnston Kadina E, Arnold Frances H

机构信息

Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California91125, United States.

Division of Biology and Bioengineering, California Institute of Technology, Pasadena, California91125, United States.

出版信息

ACS Synth Biol. 2025 Jan 17;14(1):230-238. doi: 10.1021/acssynbio.4c00625. Epub 2024 Dec 24.

DOI:10.1021/acssynbio.4c00625

PMID:39719062

Abstract

Sequence-function data provides valuable information about the protein functional landscape but is rarely obtained during directed evolution campaigns. Here, we present Long-read every variant Sequencing (LevSeq), a pipeline that combines a dual barcoding strategy with nanopore sequencing to rapidly generate sequence-function data for entire protein-coding genes. LevSeq integrates into existing protein engineering workflows and comes with open-source software for data analysis and visualization. The pipeline facilitates data-driven protein engineering by consolidating sequence-function data to inform directed evolution and provide the requisite data for machine learning-guided protein engineering (MLPE). LevSeq enables quality control of mutagenesis libraries prior to screening, which reduces time and resource costs. Simulation studies demonstrate LevSeq's ability to accurately detect variants under various experimental conditions. Finally, we show LevSeq's utility in engineering protoglobins for new-to-nature chemistry. Widespread adoption of LevSeq and sharing of the data will enhance our understanding of protein sequence-function landscapes and empower data-driven directed evolution.

摘要

序列-功能数据提供了有关蛋白质功能格局的宝贵信息，但在定向进化实验中很少能获得。在此，我们展示了长读长全变体测序（LevSeq），这是一种将双条形码策略与纳米孔测序相结合的流程，可快速为整个蛋白质编码基因生成序列-功能数据。LevSeq可集成到现有的蛋白质工程工作流程中，并配有用于数据分析和可视化的开源软件。该流程通过整合序列-功能数据来促进数据驱动的蛋白质工程，为定向进化提供信息，并为机器学习指导的蛋白质工程（MLPE）提供必要的数据。LevSeq能够在筛选之前对诱变文库进行质量控制，从而降低时间和资源成本。模拟研究证明了LevSeq在各种实验条件下准确检测变体的能力。最后，我们展示了LevSeq在设计用于新型化学的原球蛋白方面的效用。LevSeq的广泛采用和数据共享将增强我们对蛋白质序列-功能格局的理解，并推动数据驱动的定向进化。

相似文献

LevSeq: Rapid Generation of Sequence-Function Data for Directed Evolution and Machine Learning.LevSeq：用于定向进化和机器学习的序列-功能数据的快速生成

ACS Synth Biol. 2025 Jan 17;14(1):230-238. doi: 10.1021/acssynbio.4c00625. Epub 2024 Dec 24.

Machine-learning-guided directed evolution for protein engineering.基于机器学习的定向进化蛋白质工程。

Nat Methods. 2019 Aug;16(8):687-694. doi: 10.1038/s41592-019-0496-6. Epub 2019 Jul 15.

Engineering highly active nuclease enzymes with machine learning and high-throughput screening.利用机器学习和高通量筛选技术设计高活性核酸酶

Cell Syst. 2025 Mar 19;16(3):101236. doi: 10.1016/j.cels.2025.101236. Epub 2025 Mar 12.

Learning Strategies in Protein Directed Evolution.蛋白质定向进化中的学习策略。

Methods Mol Biol. 2022;2461:225-275. doi: 10.1007/978-1-0716-2152-3_15.

Machine learning-assisted directed protein evolution with combinatorial libraries.机器学习辅助的组合文库定向蛋白质进化。

Proc Natl Acad Sci U S A. 2019 Apr 30;116(18):8852-8858. doi: 10.1073/pnas.1901979116. Epub 2019 Apr 12.

Facile Assembly of Combinatorial Mutagenesis Libraries Using Nicking Mutagenesis.利用缺口诱变轻松组装组合诱变文库。

Methods Mol Biol. 2022;2461:85-109. doi: 10.1007/978-1-0716-2152-3_6.

Fast and Flexible Synthesis of Combinatorial Libraries for Directed Evolution.用于定向进化的组合文库的快速灵活合成

Methods Enzymol. 2018;608:59-79. doi: 10.1016/bs.mie.2018.04.006. Epub 2018 May 24.

Meta learning addresses noisy and under-labeled data in machine learning-guided antibody engineering.元学习解决了机器学习引导的抗体工程中的噪声数据和标签不足的数据问题。

Cell Syst. 2024 Jan 17;15(1):4-18.e4. doi: 10.1016/j.cels.2023.12.003. Epub 2024 Jan 8.

Machine learning to navigate fitness landscapes for protein engineering.机器学习在蛋白质工程中的应用：探索适应度景观

Curr Opin Biotechnol. 2022 Jun;75:102713. doi: 10.1016/j.copbio.2022.102713. Epub 2022 Apr 9.

Machine learning-guided co-optimization of fitness and diversity facilitates combinatorial library design in enzyme engineering.机器学习引导的适应性和多样性协同优化促进了酶工程组合文库设计。

Nat Commun. 2024 Jul 29;15(1):6392. doi: 10.1038/s41467-024-50698-y.

引用本文的文献

Scaling DNA engineering.扩展DNA工程。

Trends Biotechnol. 2025 May 28. doi: 10.1016/j.tibtech.2025.05.002.

Active learning-assisted directed evolution.主动学习辅助的定向进化

Nat Commun. 2025 Jan 16;16(1):714. doi: 10.1038/s41467-025-55987-8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

LevSeq：用于定向进化和机器学习的序列-功能数据的快速生成

LevSeq: Rapid Generation of Sequence-Function Data for Directed Evolution and Machine Learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献