DeepMind, London, UK.
The Francis Crick Institute, London, UK.
Nature. 2020 Jan;577(7792):706-710. doi: 10.1038/s41586-019-1923-7. Epub 2020 Jan 15.
Protein structure prediction can be used to determine the three-dimensional shape of a protein from its amino acid sequence. This problem is of fundamental importance as the structure of a protein largely determines its function; however, protein structures can be difficult to determine experimentally. Considerable progress has recently been made by leveraging genetic information. It is possible to infer which amino acid residues are in contact by analysing covariation in homologous sequences, which aids in the prediction of protein structures. Here we show that we can train a neural network to make accurate predictions of the distances between pairs of residues, which convey more information about the structure than contact predictions. Using this information, we construct a potential of mean force that can accurately describe the shape of a protein. We find that the resulting potential can be optimized by a simple gradient descent algorithm to generate structures without complex sampling procedures. The resulting system, named AlphaFold, achieves high accuracy, even for sequences with fewer homologous sequences. In the recent Critical Assessment of Protein Structure Prediction (CASP13)-a blind assessment of the state of the field-AlphaFold created high-accuracy structures (with template modelling (TM) scores of 0.7 or higher) for 24 out of 43 free modelling domains, whereas the next best method, which used sampling and contact information, achieved such accuracy for only 14 out of 43 domains. AlphaFold represents a considerable advance in protein-structure prediction. We expect this increased accuracy to enable insights into the function and malfunction of proteins, especially in cases for which no structures for homologous proteins have been experimentally determined.
蛋白质结构预测可用于根据其氨基酸序列确定蛋白质的三维形状。这个问题至关重要,因为蛋白质的结构在很大程度上决定了其功能;然而,蛋白质的结构可能很难通过实验来确定。最近,利用遗传信息取得了相当大的进展。通过分析同源序列中的协变,可以推断出哪些氨基酸残基相互接触,这有助于预测蛋白质结构。在这里,我们展示了我们可以训练神经网络来准确预测氨基酸残基对之间的距离,这些距离比接触预测传递更多的结构信息。利用这些信息,我们构建了一个平均力势,可以准确描述蛋白质的形状。我们发现,通过简单的梯度下降算法可以对这个势进行优化,从而生成无需复杂采样过程的结构。由此产生的系统被命名为 AlphaFold,即使对于同源序列较少的序列,也能达到很高的准确性。在最近的蛋白质结构预测关键评估(CASP13)——对该领域状态的盲测中,AlphaFold 为 43 个自由建模域中的 24 个生成了高精度结构(模板建模(TM)分数为 0.7 或更高),而下一个最好的方法,即使用采样和接触信息,只有 14 个域达到了这种精度。AlphaFold 代表了蛋白质结构预测的重大进展。我们预计这种准确性的提高将使我们能够深入了解蛋白质的功能和故障,特别是在没有实验确定同源蛋白质结构的情况下。