Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States.
Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California 91125, United States.
ACS Synth Biol. 2023 Aug 18;12(8):2444-2454. doi: 10.1021/acssynbio.3c00301. Epub 2023 Jul 31.
With advances in machine learning (ML)-assisted protein engineering, models based on data, biophysics, and natural evolution are being used to propose informed libraries of protein variants to explore. Synthesizing these libraries for experimental screens is a major bottleneck, as the cost of obtaining large numbers of exact gene sequences is often prohibitive. Degenerate codon (DC) libraries are a cost-effective alternative for generating combinatorial mutagenesis libraries where mutations are targeted to a handful of amino acid sites. However, existing computational methods to optimize DC libraries to include desired protein variants are not well suited to design libraries for ML-assisted protein engineering. To address these drawbacks, we present DEgenerate Codon Optimization for Informed Libraries (DeCOIL), a generalized method that directly optimizes DC libraries to be useful for protein engineering: to sample protein variants that are likely to have both high fitness and high diversity in the sequence search space. Using computational simulations and wet-lab experiments, we demonstrate that DeCOIL is effective across two specific case studies, with the potential to be applied to many other use cases. DeCOIL offers several advantages over existing methods, as it is direct, easy to use, generalizable, and scalable. With accompanying software (https://github.com/jsunn-y/DeCOIL), DeCOIL can be readily implemented to generate desired informed libraries.
随着机器学习 (ML) 辅助蛋白质工程的进步,基于数据、生物物理学和自然进化的模型被用于提出有针对性的蛋白质变体文库进行探索。为实验筛选合成这些文库是一个主要的瓶颈,因为获得大量精确基因序列的成本通常是过高的。简并密码子 (DC) 文库是一种具有成本效益的替代方法,可以生成组合诱变文库,其中突变针对少数几个氨基酸位点。然而,现有的计算方法来优化 DC 文库以包含所需的蛋白质变体并不适合设计用于 ML 辅助蛋白质工程的文库。为了解决这些缺点,我们提出了用于有针对性文库的简并密码子优化 (DeCOIL),这是一种通用方法,可以直接优化 DC 文库,使其在蛋白质工程中有用:在序列搜索空间中,对具有高适应性和高多样性的蛋白质变体进行采样。通过计算模拟和湿实验,我们证明了 DeCOIL 在两个具体案例研究中是有效的,并且有可能应用于许多其他用例。与现有方法相比,DeCOIL 具有几个优势,因为它直接、易于使用、可推广和可扩展。通过配套的软件 (https://github.com/jsunn-y/DeCOIL),可以很容易地实现 DeCOIL 来生成所需的有针对性的文库。