School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA.
Department of Chemistry and Biochemistry, University of California, Santa Barbara, CA, 93106, USA.
Nat Commun. 2024 Jul 29;15(1):6392. doi: 10.1038/s41467-024-50698-y.
The effective design of combinatorial libraries to balance fitness and diversity facilitates the engineering of useful enzyme functions, particularly those that are poorly characterized or unknown in biology. We introduce MODIFY, a machine learning (ML) algorithm that learns from natural protein sequences to infer evolutionarily plausible mutations and predict enzyme fitness. MODIFY co-optimizes predicted fitness and sequence diversity of starting libraries, prioritizing high-fitness variants while ensuring broad sequence coverage. In silico evaluation shows that MODIFY outperforms state-of-the-art unsupervised methods in zero-shot fitness prediction and enables ML-guided directed evolution with enhanced efficiency. Using MODIFY, we engineer generalist biocatalysts derived from a thermostable cytochrome c to achieve enantioselective C-B and C-Si bond formation via a new-to-nature carbene transfer mechanism, leading to biocatalysts six mutations away from previously developed enzymes while exhibiting superior or comparable activities. These results demonstrate MODIFY's potential in solving challenging enzyme engineering problems beyond the reach of classic directed evolution.
组合文库的有效设计可以平衡适应性和多样性,从而有助于工程酶功能,特别是那些在生物学中特征较差或未知的功能。我们引入了 MODIFY,这是一种机器学习(ML)算法,它从天然蛋白质序列中学习,以推断出进化上合理的突变,并预测酶的适应性。MODIFY 共同优化起始文库的预测适应性和序列多样性,优先考虑高适应性变体,同时确保广泛的序列覆盖。计算机评估表明,MODIFY 在零镜头适应性预测方面优于最先进的无监督方法,并能够通过增强效率的 ML 指导定向进化。使用 MODIFY,我们从耐热细胞色素 c 中设计出了一种通用生物催化剂,通过一种新的天然卡宾转移机制实现对映选择性 C-B 和 C-Si 键形成,从而得到了距离以前开发的酶有六个突变的生物催化剂,同时表现出优异或相当的活性。这些结果表明,MODIFY 有潜力解决经典定向进化无法解决的具有挑战性的酶工程问题。