Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125.
Division of Biology and Bioengineering, California Institute of Technology, Pasadena, CA 91125.
Proc Natl Acad Sci U S A. 2019 Apr 30;116(18):8852-8858. doi: 10.1073/pnas.1901979116. Epub 2019 Apr 12.
To reduce experimental effort associated with directed protein evolution and to explore the sequence space encoded by mutating multiple positions simultaneously, we incorporate machine learning into the directed evolution workflow. Combinatorial sequence space can be quite expensive to sample experimentally, but machine-learning models trained on tested variants provide a fast method for testing sequence space computationally. We validated this approach on a large published empirical fitness landscape for human GB1 binding protein, demonstrating that machine learning-guided directed evolution finds variants with higher fitness than those found by other directed evolution approaches. We then provide an example application in evolving an enzyme to produce each of the two possible product enantiomers (i.e., stereodivergence) of a new-to-nature carbene Si-H insertion reaction. The approach predicted libraries enriched in functional enzymes and fixed seven mutations in two rounds of evolution to identify variants for selective catalysis with 93% and 79% (enantiomeric excess). By greatly increasing throughput with in silico modeling, machine learning enhances the quality and diversity of sequence solutions for a protein engineering problem.
为了减少定向蛋白质进化相关的实验工作量,并探索同时突变多个位置所编码的序列空间,我们将机器学习纳入定向进化工作流程中。组合序列空间在实验上进行采样可能非常昂贵,但基于已测试变体训练的机器学习模型为计算上测试序列空间提供了快速方法。我们在一个大型已发表的人类 GB1 结合蛋白经验适应性景观上验证了这种方法,证明了机器学习指导的定向进化可以找到比其他定向进化方法更高适应性的变体。然后,我们提供了一个酶进化的应用示例,以产生一种新的天然卡宾 Si-H 插入反应的两种可能产物对映异构体(即立体发散)。该方法预测了富含功能酶的文库,并在两轮进化中固定了七个突变,以鉴定选择性催化的变体,其对映体过量(ee)分别为 93%和 79%。通过大大提高基于计算机建模的通量,机器学习增强了蛋白质工程问题的序列解决方案的质量和多样性。