Suppr超能文献

机器学习在定向进化中的进展。

Advances in machine learning for directed evolution.

机构信息

Division of Biology and Biological Engineering, California Institute of Technology, MC 210-41, 1200 E. California Boulevard, Pasadena, CA 91125, USA.

Division of Chemistry and Chemical Engineering, California Institute of Technology, MC 210-41, 1200 E. California Boulevard, Pasadena, CA 91125, USA; Present address: Google DeepMind, 6 Pancras Square, Kings Cross, London, N1C 4AG, UK.

出版信息

Curr Opin Struct Biol. 2021 Aug;69:11-18. doi: 10.1016/j.sbi.2021.01.008. Epub 2021 Feb 26.

Abstract

Machine learning (ML) can expedite directed evolution by allowing researchers to move expensive experimental screens in silico. Gathering sequence-function data for training ML models, however, can still be costly. In contrast, raw protein sequence data is widely available. Recent advances in ML approaches use protein sequences to augment limited sequence-function data for directed evolution. We highlight contributions in a growing effort to use sequences to reduce or eliminate the amount of sequence-function data needed for effective in silico screening. We also highlight approaches that use ML models trained on sequences to generate new functional sequence diversity, focusing on strategies that use these generative models to efficiently explore vast regions of protein space.

摘要

机器学习(ML)可以通过允许研究人员在计算机上进行昂贵的实验筛选来加速定向进化。然而,为 ML 模型收集序列-功能数据仍然可能很昂贵。相比之下,原始蛋白质序列数据广泛可用。最近,ML 方法的进展利用蛋白质序列来增加有限的序列-功能数据,以进行定向进化。我们强调了在利用序列减少或消除有效计算机筛选所需的序列-功能数据量方面所做的努力。我们还强调了利用 ML 模型在序列上进行训练以生成新的功能序列多样性的方法,重点介绍了使用这些生成模型来有效地探索蛋白质空间广阔区域的策略。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验