Suppr超能文献

基于深度学习的距离相关蛋白质折叠。

Distance-based protein folding powered by deep learning.

机构信息

Toyota Technological Institute at Chicago, Chicago, IL 60637

出版信息

Proc Natl Acad Sci U S A. 2019 Aug 20;116(34):16856-16865. doi: 10.1073/pnas.1821309116. Epub 2019 Aug 9.

Abstract

Direct coupling analysis (DCA) for protein folding has made very good progress, but it is not effective for proteins that lack many sequence homologs, even coupled with time-consuming conformation sampling with fragments. We show that we can accurately predict interresidue distance distribution of a protein by deep learning, even for proteins with ∼60 sequence homologs. Using only the geometric constraints given by the resulting distance matrix we may construct 3D models without involving extensive conformation sampling. Our method successfully folded 21 of the 37 CASP12 hard targets with a median family size of 58 effective sequence homologs within 4 h on a Linux computer of 20 central processing units. In contrast, DCA-predicted contacts cannot be used to fold any of these hard targets in the absence of extensive conformation sampling, and the best CASP12 group folded only 11 of them by integrating DCA-predicted contacts into fragment-based conformation sampling. Rigorous experimental validation in CASP13 shows that our distance-based folding server successfully folded 17 of 32 hard targets (with a median family size of 36 sequence homologs) and obtained 70% precision on the top L/5 long-range predicted contacts. The latest experimental validation in CAMEO shows that our server predicted correct folds for 2 membrane proteins while all of the other servers failed. These results demonstrate that it is now feasible to predict correct fold for many more proteins lack of similar structures in the Protein Data Bank even on a personal computer.

摘要

直接耦合分析(DCA)在蛋白质折叠方面取得了很好的进展,但对于缺乏许多序列同源物的蛋白质,即使与片段的耗时构象采样相结合,也不是很有效。我们表明,我们可以通过深度学习准确预测蛋白质的残基间距离分布,即使对于具有约 60 个序列同源物的蛋白质也是如此。仅使用由此产生的距离矩阵给出的几何约束,我们就可以在不涉及广泛构象采样的情况下构建 3D 模型。我们的方法在 20 个中央处理器的 Linux 计算机上仅用 4 小时成功折叠了 37 个 CASP12 硬目标中的 21 个,其平均家族大小为 58 个有效序列同源物。相比之下,在没有广泛构象采样的情况下,DCA 预测的接触不能用于折叠这些硬目标中的任何一个,而最好的 CASP12 组通过将 DCA 预测的接触整合到基于片段的构象采样中,仅折叠了其中的 11 个。在 CASP13 中的严格实验验证表明,我们基于距离的折叠服务器成功折叠了 32 个硬目标中的 17 个(平均家族大小为 36 个序列同源物),并且在顶级 L/5 长程预测接触中获得了 70%的精度。在 CAMEO 中的最新实验验证表明,我们的服务器预测了 2 个膜蛋白的正确折叠,而其他所有服务器都失败了。这些结果表明,现在即使在个人计算机上,也有可能预测出许多缺乏蛋白质数据库中相似结构的蛋白质的正确折叠。

相似文献

1
Distance-based protein folding powered by deep learning.基于深度学习的距离相关蛋白质折叠。
Proc Natl Acad Sci U S A. 2019 Aug 20;116(34):16856-16865. doi: 10.1073/pnas.1821309116. Epub 2019 Aug 9.
3
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.基于超深度学习模型的蛋白质接触图从头精确预测
PLoS Comput Biol. 2017 Jan 5;13(1):e1005324. doi: 10.1371/journal.pcbi.1005324. eCollection 2017 Jan.
7
Folding Membrane Proteins by Deep Transfer Learning.利用深度迁移学习折叠膜蛋白。
Cell Syst. 2017 Sep 27;5(3):202-211.e3. doi: 10.1016/j.cels.2017.09.001.
10
Estimation of model accuracy in CASP13.CASP13 模型精度估计。
Proteins. 2019 Dec;87(12):1361-1377. doi: 10.1002/prot.25767. Epub 2019 Jul 16.

引用本文的文献

本文引用的文献

1
End-to-End Differentiable Learning of Protein Structure.端到端可微分蛋白质结构学习
Cell Syst. 2019 Apr 24;8(4):292-301.e3. doi: 10.1016/j.cels.2019.03.006. Epub 2019 Apr 17.
2
Protein threading using residue co-variation and deep learning.使用残基共变和深度学习进行蛋白质穿线。
Bioinformatics. 2018 Jul 1;34(13):i263-i273. doi: 10.1093/bioinformatics/bty278.
7
Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks.利用深度卷积神经网络增强进化耦合。
Cell Syst. 2018 Jan 24;6(1):65-74.e3. doi: 10.1016/j.cels.2017.11.014. Epub 2017 Dec 20.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验