基于深度学习的距离相关蛋白质折叠。

Distance-based protein folding powered by deep learning.

机构信息

Toyota Technological Institute at Chicago, Chicago, IL 60637

出版信息

Proc Natl Acad Sci U S A. 2019 Aug 20;116(34):16856-16865. doi: 10.1073/pnas.1821309116. Epub 2019 Aug 9.

DOI:10.1073/pnas.1821309116

PMID:31399549

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6708335/

Abstract

Direct coupling analysis (DCA) for protein folding has made very good progress, but it is not effective for proteins that lack many sequence homologs, even coupled with time-consuming conformation sampling with fragments. We show that we can accurately predict interresidue distance distribution of a protein by deep learning, even for proteins with ∼60 sequence homologs. Using only the geometric constraints given by the resulting distance matrix we may construct 3D models without involving extensive conformation sampling. Our method successfully folded 21 of the 37 CASP12 hard targets with a median family size of 58 effective sequence homologs within 4 h on a Linux computer of 20 central processing units. In contrast, DCA-predicted contacts cannot be used to fold any of these hard targets in the absence of extensive conformation sampling, and the best CASP12 group folded only 11 of them by integrating DCA-predicted contacts into fragment-based conformation sampling. Rigorous experimental validation in CASP13 shows that our distance-based folding server successfully folded 17 of 32 hard targets (with a median family size of 36 sequence homologs) and obtained 70% precision on the top L/5 long-range predicted contacts. The latest experimental validation in CAMEO shows that our server predicted correct folds for 2 membrane proteins while all of the other servers failed. These results demonstrate that it is now feasible to predict correct fold for many more proteins lack of similar structures in the Protein Data Bank even on a personal computer.

摘要

直接耦合分析（DCA）在蛋白质折叠方面取得了很好的进展，但对于缺乏许多序列同源物的蛋白质，即使与片段的耗时构象采样相结合，也不是很有效。我们表明，我们可以通过深度学习准确预测蛋白质的残基间距离分布，即使对于具有约 60 个序列同源物的蛋白质也是如此。仅使用由此产生的距离矩阵给出的几何约束，我们就可以在不涉及广泛构象采样的情况下构建 3D 模型。我们的方法在 20 个中央处理器的 Linux 计算机上仅用 4 小时成功折叠了 37 个 CASP12 硬目标中的 21 个，其平均家族大小为 58 个有效序列同源物。相比之下，在没有广泛构象采样的情况下，DCA 预测的接触不能用于折叠这些硬目标中的任何一个，而最好的 CASP12 组通过将 DCA 预测的接触整合到基于片段的构象采样中，仅折叠了其中的 11 个。在 CASP13 中的严格实验验证表明，我们基于距离的折叠服务器成功折叠了 32 个硬目标中的 17 个（平均家族大小为 36 个序列同源物），并且在顶级 L/5 长程预测接触中获得了 70%的精度。在 CAMEO 中的最新实验验证表明，我们的服务器预测了 2 个膜蛋白的正确折叠，而其他所有服务器都失败了。这些结果表明，现在即使在个人计算机上，也有可能预测出许多缺乏蛋白质数据库中相似结构的蛋白质的正确折叠。

相似文献

Distance-based protein folding powered by deep learning.基于深度学习的距离相关蛋白质折叠。

Proc Natl Acad Sci U S A. 2019 Aug 20;116(34):16856-16865. doi: 10.1073/pnas.1821309116. Epub 2019 Aug 9.

Analysis of distance-based protein structure prediction by deep learning in CASP13.基于深度学习的 CASP13 蛋白质结构预测距离分析。

Proteins. 2019 Dec;87(12):1069-1081. doi: 10.1002/prot.25810. Epub 2019 Sep 13.

Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.基于超深度学习模型的蛋白质接触图从头精确预测

PLoS Comput Biol. 2017 Jan 5;13(1):e1005324. doi: 10.1371/journal.pcbi.1005324. eCollection 2017 Jan.

Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13.基于深度学习的蛋白质三级结构建模和 CASP13 中的接触距离预测。

Proteins. 2019 Dec;87(12):1165-1178. doi: 10.1002/prot.25697. Epub 2019 Apr 25.

Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.通过整合深度多序列比对、协同进化和机器学习进行蛋白质接触预测。

Proteins. 2018 Mar;86 Suppl 1(Suppl 1):84-96. doi: 10.1002/prot.25405. Epub 2017 Oct 31.

Analysis of deep learning methods for blind protein contact prediction in CASP12.CASP12中用于蛋白质盲态接触预测的深度学习方法分析

Proteins. 2018 Mar;86 Suppl 1(Suppl 1):67-77. doi: 10.1002/prot.25377. Epub 2017 Sep 6.

Folding Membrane Proteins by Deep Transfer Learning.利用深度迁移学习折叠膜蛋白。

Cell Syst. 2017 Sep 27;5(3):202-211.e3. doi: 10.1016/j.cels.2017.09.001.

PredMP: a web server for de novo prediction and visualization of membrane proteins.PredMP：一个用于从头预测和可视化膜蛋白的网络服务器。

Bioinformatics. 2019 Feb 15;35(4):691-693. doi: 10.1093/bioinformatics/bty684.

Template-based and free modeling of I-TASSER and QUARK pipelines using predicted contact maps in CASP12.在蛋白质结构预测技术评估第12轮（CASP12）中，基于模板以及I-TASSER和QUARK流程的自由建模，并使用预测的接触图。

Proteins. 2018 Mar;86 Suppl 1(Suppl 1):136-151. doi: 10.1002/prot.25414. Epub 2017 Nov 14.

Estimation of model accuracy in CASP13.CASP13 模型精度估计。

Proteins. 2019 Dec;87(12):1361-1377. doi: 10.1002/prot.25767. Epub 2019 Jul 16.

引用本文的文献

Decoding the limits of deep learning in molecular docking for drug discovery.解码深度学习在药物发现分子对接中的局限性。

Chem Sci. 2025 Aug 19. doi: 10.1039/d5sc05395a.

Comprehensive Molecular Profiling of AcrAB-TolC Efflux Pump Genes in Salmonella typhi Isolates from Typhoid Infected Patients.伤寒感染患者分离出的伤寒沙门氏菌中AcrAB-TolC外排泵基因的综合分子分析

Curr Microbiol. 2025 Aug 22;82(10):470. doi: 10.1007/s00284-025-04460-2.

Structure Modeling Protocols for Protein Multimer and RNA in CASP16 With Enhanced MSAs, Model Ranking, and Deep Learning.利用增强型多序列比对、模型排序和深度学习的CASP16中蛋白质多聚体和RNA的结构建模协议

Proteins. 2025 Aug 1. doi: 10.1002/prot.70033.

Chemosensory Receptors in Vertebrates: Structure and Computational Modeling Insights.脊椎动物的化学感受器：结构与计算建模见解

Int J Mol Sci. 2025 Jul 10;26(14):6605. doi: 10.3390/ijms26146605.

Boosting AlphaFold Protein Tertiary Structure Prediction through MSA Engineering and Extensive Model Sampling and Ranking in CASP16.通过在第16届蛋白质结构预测关键评估（CASP16）中进行多序列比对（MSA）工程以及广泛的模型采样和排序来提升AlphaFold蛋白质三级结构预测

bioRxiv. 2025 Jun 9:2025.06.06.658338. doi: 10.1101/2025.06.06.658338.

Boosting AlphaFold Protein Tertiary Structure Prediction through MSA Engineering and Extensive Model Sampling and Ranking in CASP16.通过MSA工程以及在CASP16中进行广泛的模型采样和排序来提升AlphaFold蛋白质三级结构预测

Res Sq. 2025 Jun 20:rs.3.rs-6845168. doi: 10.21203/rs.3.rs-6845168/v1.

Scoping Review of Deep Learning Techniques for Diagnosis, Drug Discovery, and Vaccine Development in Leishmaniasis.利什曼病诊断、药物发现和疫苗开发中深度学习技术的范围综述

Transbound Emerg Dis. 2024 Jan 17;2024:6621199. doi: 10.1155/2024/6621199. eCollection 2024.

Unveiling the new chapter in nanobody engineering: advances in traditional construction and AI-driven optimization.揭开纳米抗体工程的新篇章：传统构建方法与人工智能驱动优化的进展

J Nanobiotechnology. 2025 Feb 6;23(1):87. doi: 10.1186/s12951-025-03169-5.

Beyond AlphaFold2: The Impact of AI for the Further Improvement of Protein Structure Prediction.超越 AlphaFold2：人工智能对进一步改进蛋白质结构预测的影响。

Methods Mol Biol. 2025;2867:121-139. doi: 10.1007/978-1-0716-4196-5_7.

Using genetic programming to predict and optimize protein function.利用遗传编程预测和优化蛋白质功能。

PeerJ Phys Chem. 2022 Apr;4. doi: 10.7717/peerj-pchem.24. Epub 2022 Sep 21.

本文引用的文献

End-to-End Differentiable Learning of Protein Structure.端到端可微分蛋白质结构学习

Cell Syst. 2019 Apr 24;8(4):292-301.e3. doi: 10.1016/j.cels.2019.03.006. Epub 2019 Apr 17.

Protein threading using residue co-variation and deep learning.使用残基共变和深度学习进行蛋白质穿线。

Bioinformatics. 2018 Jul 1;34(13):i263-i273. doi: 10.1093/bioinformatics/bty278.

Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks.通过将残差二维双向长短期记忆与卷积神经网络相结合，准确预测蛋白质接触图。

Bioinformatics. 2018 Dec 1;34(23):4039-4045. doi: 10.1093/bioinformatics/bty481.

ComplexContact: a web server for inter-protein contact prediction using deep learning.复杂接触：一个使用深度学习进行蛋白质间接触预测的网络服务器。

Nucleic Acids Res. 2018 Jul 2;46(W1):W432-W437. doi: 10.1093/nar/gky420.

RaptorX-Angle: real-value prediction of protein backbone dihedral angles through a hybrid method of clustering and deep learning. RaptorX-Angle：通过聚类和深度学习的混合方法实现蛋白质主链二面角的实值预测。

BMC Bioinformatics. 2018 May 8;19(Suppl 4):100. doi: 10.1186/s12859-018-2065-x.

High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features.利用全卷积神经网络和最小序列特征进行高精度蛋白质接触预测。

Bioinformatics. 2018 Oct 1;34(19):3308-3315. doi: 10.1093/bioinformatics/bty341.

Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks.利用深度卷积神经网络增强进化耦合。

Cell Syst. 2018 Jan 24;6(1):65-74.e3. doi: 10.1016/j.cels.2017.11.014. Epub 2017 Dec 20.

DNCON2: improved protein contact prediction using two-level deep convolutional neural networks.DNCON2：使用两级深度卷积神经网络改进蛋白质接触预测。

Bioinformatics. 2018 May 1;34(9):1466-1472. doi: 10.1093/bioinformatics/btx781.

Proteins. 2018 Mar;86 Suppl 1(Suppl 1):136-151. doi: 10.1002/prot.25414. Epub 2017 Nov 14.

Assessment of contact predictions in CASP12: Co-evolution and deep learning coming of age.蛋白质结构预测技术关键评估第12轮（CASP12）中的接触预测评估：协同进化与深度学习走向成熟。

Proteins. 2018 Mar;86 Suppl 1(Suppl Suppl 1):51-66. doi: 10.1002/prot.25407. Epub 2017 Nov 7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验