Ferguson Andrew L, Ranganathan Rama
Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States.
Center for Physics of Evolving Systems, University of Chicago, Chicago, Illinois 60637, United States.
ACS Macro Lett. 2021 Mar 16;10(3):327-340. doi: 10.1021/acsmacrolett.0c00885. Epub 2021 Feb 8.
The design of synthetic proteins with the desired function is a long-standing goal in biomolecular science, with broad applications in biochemical engineering, agriculture, medicine, and public health. Rational de novo design and experimental directed evolution have achieved remarkable successes but are challenged by the requirement to find functional "needles" in the vast "haystack" of protein sequence space. Data-driven models for fitness landscapes provide a predictive map between protein sequence and function and can prospectively identify functional candidates for experimental testing to greatly improve the efficiency of this search. This Viewpoint reviews the applications of machine learning and, in particular, deep learning as part of data-driven protein engineering platforms. We highlight recent successes, review promising computational methodologies, and provide an outlook on future challenges and opportunities. The article is written for a broad audience comprising both polymer and protein scientists and computer and data scientists interested in an up-to-date review of recent innovations and opportunities in this rapidly evolving field.
设计具有所需功能的合成蛋白质是生物分子科学中长期追求的目标,在生化工程、农业、医学和公共卫生领域有着广泛应用。合理的从头设计和实验性定向进化已取得显著成功,但面临着在蛋白质序列空间这一巨大“干草堆”中寻找功能性“针”的挑战。适存度景观的数据驱动模型提供了蛋白质序列与功能之间的预测图谱,能够前瞻性地识别用于实验测试的功能候选物,从而大大提高这一搜索的效率。本观点综述了机器学习,特别是深度学习作为数据驱动蛋白质工程平台一部分的应用。我们强调了近期的成功,回顾了有前景的计算方法,并对未来的挑战和机遇进行了展望。本文面向广大读者,包括聚合物和蛋白质科学家以及对这个快速发展领域的最新创新和机遇感兴趣的计算机和数据科学家。