Xu Jinbo, Mcpartlon Matthew, Li Jin
Toyota Technological Institute at Chicago.
Department of Computer Science, University of Chicago.
Nat Mach Intell. 2021 Jul;3:601-609. doi: 10.1038/s42256-021-00348-5. Epub 2021 May 20.
Predicting the tertiary structure of a protein from its primary sequence has been greatly improved by integrating deep learning and co-evolutionary analysis, as shown in CASP13 and CASP14. We describe our latest study of this idea, analyzing the efficacy of network size and co-evolution data and its performance on both natural and designed proteins. We show that a large ResNet (convolutional residual neural networks) can predict structures of correct folds for 26 out of 32 CASP13 free-modeling (FM) targets and L/5 long-range contacts with precision over 80%. When co-evolution is not used ResNet still can predict structures of correct folds for 18 CASP13 FM targets, greatly exceeding previous methods that do not use co-evolution either. Even with only primary sequence ResNet can predict structures of correct folds for all tested human-designed proteins. In addition, ResNet may fare better for the designed proteins when trained without co-evolution than with co-evolution. These results suggest that ResNet does not simply denoise co-evolution signals, but instead may learn important protein sequence-structure relationship. This has important implications on protein design and engineering especially when co-evolutionary data is unavailable.
如在蛋白质结构预测技术关键评估第13轮(CASP13)和第14轮(CASP14)中所示,通过整合深度学习和共进化分析,从蛋白质一级序列预测其三级结构的能力有了很大提高。我们描述了我们对这一理念的最新研究,分析了网络规模和共进化数据的有效性及其在天然蛋白质和设计蛋白质上的性能。我们表明,一个大型残差网络(卷积残差神经网络)可以为32个CASP13自由建模(FM)目标中的26个预测正确折叠的结构,以及预测L/5长程接触,精度超过80%。当不使用共进化时,残差网络仍然可以为18个CASP13 FM目标预测正确折叠的结构,大大超过了之前同样不使用共进化的方法。即使仅使用一级序列,残差网络也可以为所有测试的人工设计蛋白质预测正确折叠的结构。此外,在不使用共进化进行训练时,残差网络对设计蛋白质的表现可能优于使用共进化训练时。这些结果表明,残差网络并非简单地去除共进化信号中的噪声,而是可能学习到了重要的蛋白质序列-结构关系。这对蛋白质设计和工程具有重要意义,尤其是在无法获得共进化数据时。