Vornholt Tobias, Mutný Mojmír, Schmidt Gregor W, Schellhaas Christian, Tachibana Ryo, Panke Sven, Ward Thomas R, Krause Andreas, Jeschek Markus
Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, 4058 Basel, Switzerland.
National Centre of Competence in Research (NCCR) Molecular Systems Engineering, 4056 Basel,Switzerland.
ACS Cent Sci. 2024 May 22;10(7):1357-1370. doi: 10.1021/acscentsci.4c00258. eCollection 2024 Jul 24.
Tailored enzymes are crucial for the transition to a sustainable bioeconomy. However, enzyme engineering is laborious and failure-prone due to its reliance on serendipity. The efficiency and success rates of engineering campaigns may be improved by applying machine learning to map the sequence-activity landscape based on small experimental data sets. Yet, it often proves challenging to reliably model large sequence spaces while keeping the experimental effort tractable. To address this challenge, we present an integrated pipeline combining large-scale screening with active machine learning, which we applied to engineer an artificial metalloenzyme (ArM) catalyzing a new-to-nature hydroamination reaction. Combining lab automation and next-generation sequencing, we acquired sequence-activity data for several thousand ArM variants. We then used Gaussian process regression to model the activity landscape and guide further screening rounds. Critical characteristics of our pipeline include the cost-effective generation of information-rich data sets, the integration of an explorative round to improve the model's performance, and the inclusion of experimental noise. Our approach led to an order-of-magnitude boost in the hit rate while making efficient use of experimental resources. Search strategies like this should find broad utility in enzyme engineering and accelerate the development of novel biocatalysts.
定制酶对于向可持续生物经济的转型至关重要。然而,由于酶工程依赖于偶然性,因此它既费力又容易失败。通过应用机器学习基于小型实验数据集绘制序列-活性图谱,可以提高工程改造活动的效率和成功率。然而,在保持实验工作量可控的同时,可靠地对大型序列空间进行建模往往具有挑战性。为了应对这一挑战,我们提出了一种将大规模筛选与主动机器学习相结合的集成流程,并将其应用于设计一种催化新型氢化胺化反应的人工金属酶(ArM)。结合实验室自动化和下一代测序技术,我们获得了数千个ArM变体的序列-活性数据。然后,我们使用高斯过程回归对活性图谱进行建模,并指导进一步的筛选轮次。我们流程的关键特性包括经济高效地生成信息丰富的数据集、整合探索轮次以提高模型性能以及纳入实验噪声。我们的方法在有效利用实验资源的同时,使命中率提高了一个数量级。这样的搜索策略在酶工程中应具有广泛的用途,并加速新型生物催化剂的开发。