Strokach Alexey, Kim Philip M
Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, M5S 2E4, Ontario, Canada.
Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, M5S 2E4, Ontario, Canada; Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, M5S 3E1, Ontario, Canada; Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Toronto, M5S 1A8, Ontario, Canada.
Curr Opin Struct Biol. 2022 Feb;72:226-236. doi: 10.1016/j.sbi.2021.11.008. Epub 2021 Dec 25.
Deep learning approaches have produced substantial breakthroughs in fields such as image classification and natural language processing and are making rapid inroads in the area of protein design. Many generative models of proteins have been developed that encompass all known protein sequences, model specific protein families, or extrapolate the dynamics of individual proteins. Those generative models can learn protein representations that are often more informative of protein structure and function than hand-engineered features. Furthermore, they can be used to quickly propose millions of novel proteins that resemble the native counterparts in terms of expression level, stability, or other attributes. The protein design process can further be guided by discriminative oracles to select candidates with the highest probability of having the desired properties. In this review, we discuss five classes of generative models that have been most successful at modeling proteins and provide a framework for model guided protein design.
深度学习方法在图像分类和自然语言处理等领域取得了重大突破,并正在蛋白质设计领域迅速取得进展。已经开发了许多蛋白质生成模型,这些模型涵盖了所有已知的蛋白质序列、特定的蛋白质家族模型,或者推断单个蛋白质的动力学。这些生成模型能够学习蛋白质表征,这些表征通常比手工设计的特征更能提供有关蛋白质结构和功能的信息。此外,它们可用于快速提出数百万种新型蛋白质,这些蛋白质在表达水平、稳定性或其他属性方面与天然对应物相似。蛋白质设计过程可以进一步由判别预言机指导,以选择具有所需特性的最高概率的候选物。在这篇综述中,我们讨论了在蛋白质建模方面最成功的五类生成模型,并提供了一个模型指导蛋白质设计的框架。