Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA.
MIT Media Laboratory, Cambridge, MA, USA.
Nat Methods. 2019 Dec;16(12):1315-1322. doi: 10.1038/s41592-019-0598-1. Epub 2019 Oct 21.
Rational protein engineering requires a holistic understanding of protein function. Here, we apply deep learning to unlabeled amino-acid sequences to distill the fundamental features of a protein into a statistical representation that is semantically rich and structurally, evolutionarily and biophysically grounded. We show that the simplest models built on top of this unified representation (UniRep) are broadly applicable and generalize to unseen regions of sequence space. Our data-driven approach predicts the stability of natural and de novo designed proteins, and the quantitative function of molecularly diverse mutants, competitively with the state-of-the-art methods. UniRep further enables two orders of magnitude efficiency improvement in a protein engineering task. UniRep is a versatile summary of fundamental protein features that can be applied across protein engineering informatics.
理性蛋白质工程需要对蛋白质功能有一个整体的理解。在这里,我们将深度学习应用于未标记的氨基酸序列,将蛋白质的基本特征提炼成一种统计表示,这种表示在语义上是丰富的,在结构上、进化上和生物物理上是有根据的。我们表明,建立在这个统一表示基础上的最简单的模型(UniRep)具有广泛的适用性,并能推广到序列空间中未见的区域。我们的数据驱动方法可以与最先进的方法竞争,预测天然和从头设计的蛋白质的稳定性,以及分子多样性突变体的定量功能。UniRep 进一步使蛋白质工程任务的效率提高了两个数量级。UniRep 是一种基本蛋白质特征的多功能总结,可以应用于蛋白质工程信息学的各个方面。