Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge, CB2 1GA, UK.
Department of Computer Science, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK.
Faraday Discuss. 2024 Sep 11;252(0):89-114. doi: 10.1039/d4fd00065j.
Protein design and directed evolution have separately contributed enormously to protein engineering. Without being mutually exclusive, the former relies on computation from first principles, while the latter is a combinatorial approach based on chance. Advances in ultrahigh throughput (uHT) screening, next generation sequencing and machine learning may create alternative routes to engineered proteins, where functional information linked to specific sequences is interpreted and extrapolated . In particular, the miniaturisation of functional tests in water-in-oil emulsion droplets with picoliter volumes and their rapid generation and analysis (>1 kHz) allows screening of >10-membered libraries in a day. Subsequently, decoding the selected clones by short or long-read sequencing methods leads to large sequence-function datasets that may allow extrapolation from experimental directed evolution to further improved mutants beyond the observed hits. In this work, we explore experimental strategies for how to draw up 'fitness landscapes' in sequence space with uHT droplet microfluidics, review the current state of AI/ML in enzyme engineering and discuss how uHT datasets may be combined with AI/ML to make meaningful predictions and accelerate biocatalyst engineering.
蛋白质设计和定向进化分别为蛋白质工程做出了巨大贡献。虽然它们不是相互排斥的,但前者依赖于从第一原理进行计算,而后者则是基于机会的组合方法。超高通量 (uHT) 筛选、下一代测序和机器学习的进步可能会为工程蛋白创造替代途径,其中与特定序列相关的功能信息被解释和推断出来。特别是,油包水乳液液滴中具有皮升体积的功能测试的小型化及其快速生成和分析 (>1 kHz) 允许在一天内筛选 >10 成员文库。随后,通过短读或长读测序方法对选定的克隆进行解码,可得到大量的序列-功能数据集,这些数据集可能允许从实验定向进化推断出进一步改进的突变体,超出观察到的突变体。在这项工作中,我们探讨了使用 uHT 液滴微流控技术在序列空间中绘制“适应性景观”的实验策略,回顾了人工智能/机器学习在酶工程中的现状,并讨论了如何将 uHT 数据集与人工智能/机器学习相结合,以做出有意义的预测并加速生物催化剂工程。