通过整合深度多序列比对、协同进化和机器学习进行蛋白质接触预测。

Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.

作者信息

Adhikari Badri, Hou Jie, Cheng Jianlin

机构信息

Department of Mathematics and Computer Science, University of Missouri-St. Louis, St. Louis, Missouri.

Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri.

出版信息

Proteins. 2018 Mar;86 Suppl 1(Suppl 1):84-96. doi: 10.1002/prot.25405. Epub 2017 Oct 31.

DOI:10.1002/prot.25405

PMID:29047157

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5820155/

Abstract

In this study, we report the evaluation of the residue-residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution, and machine learning on contact prediction. The first method (MULTICOM-NOVEL) uses only traditional features (sequence profile, secondary structure, and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM-CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution-based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM-CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free-modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long-range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution-based features, and machine learning integration of coevolution-based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution-based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long-range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66.

摘要

在本研究中，我们报告了在蛋白质结构预测技术关键评估第12轮（CASP12）实验中，对我们三种不同方法预测的残基-残基接触的评估，重点是研究多序列比对、残基协同进化和机器学习对接触预测的影响。第一种方法（MULTICOM-NOVEL）仅使用传统特征（序列谱、二级结构和溶剂可及性）结合深度学习来预测接触，并作为基线。第二种方法（MULTICOM-CONSTRUCT）使用我们新的比对算法生成深度多序列比对，以得出基于协同进化的特征，这些特征通过神经网络方法进行整合以预测接触。第三种方法（MULTICOM-CLUSTER）是前两种方法预测结果的一致性组合。我们在94个CASP12结构域上评估了我们的方法。在38个自由建模结构域的子集上，对于前L/5个长程接触预测，我们的方法平均精度高达41.7%。三种方法的比较表明，多序列比对的质量和有效深度、基于协同进化的特征以及基于协同进化特征与传统特征的机器学习整合，推动了预测蛋白质接触的质量。在完整的CASP12数据集上，当评估前L/5个预测的长程接触时，仅基于协同进化的特征就能将平均精度从28.4%提高到41.6%，所有特征的机器学习整合进一步将精度提高到56.3%。并且接触预测精度与比对中有效序列数量的对数之间的相关性为0.66。

相似文献

Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.通过整合深度多序列比对、协同进化和机器学习进行蛋白质接触预测。

Proteins. 2018 Mar;86 Suppl 1(Suppl 1):84-96. doi: 10.1002/prot.25405. Epub 2017 Oct 31.

Analysis of deep learning methods for blind protein contact prediction in CASP12.CASP12中用于蛋白质盲态接触预测的深度学习方法分析

Proteins. 2018 Mar;86 Suppl 1(Suppl 1):67-77. doi: 10.1002/prot.25377. Epub 2017 Sep 6.

Analysis of distance-based protein structure prediction by deep learning in CASP13.基于深度学习的 CASP13 蛋白质结构预测距离分析。

Proteins. 2019 Dec;87(12):1069-1081. doi: 10.1002/prot.25810. Epub 2019 Sep 13.

Template-based and free modeling of I-TASSER and QUARK pipelines using predicted contact maps in CASP12.在蛋白质结构预测技术评估第12轮（CASP12）中，基于模板以及I-TASSER和QUARK流程的自由建模，并使用预测的接触图。

Proteins. 2018 Mar;86 Suppl 1(Suppl 1):136-151. doi: 10.1002/prot.25414. Epub 2017 Nov 14.

DNCON2: improved protein contact prediction using two-level deep convolutional neural networks.DNCON2：使用两级深度卷积神经网络改进蛋白质接触预测。

Bioinformatics. 2018 May 1;34(9):1466-1472. doi: 10.1093/bioinformatics/btx781.

Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.基于超深度学习模型的蛋白质接触图从头精确预测

PLoS Comput Biol. 2017 Jan 5;13(1):e1005324. doi: 10.1371/journal.pcbi.1005324. eCollection 2017 Jan.

Assessment of contact predictions in CASP12: Co-evolution and deep learning coming of age.蛋白质结构预测技术关键评估第12轮（CASP12）中的接触预测评估：协同进化与深度学习走向成熟。

Proteins. 2018 Mar;86 Suppl 1(Suppl Suppl 1):51-66. doi: 10.1002/prot.25407. Epub 2017 Nov 7.

Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13.基于深度学习的蛋白质三级结构建模和 CASP13 中的接触距离预测。

Proteins. 2019 Dec;87(12):1165-1178. doi: 10.1002/prot.25697. Epub 2019 Apr 25.

Accurate contact predictions using covariation techniques and machine learning.使用共变技术和机器学习进行准确的接触预测。

Proteins. 2016 Sep;84 Suppl 1(Suppl Suppl 1):145-51. doi: 10.1002/prot.24863. Epub 2015 Aug 14.

Protein Residue Contacts and Prediction Methods.蛋白质残基接触与预测方法

Methods Mol Biol. 2016;1415:463-76. doi: 10.1007/978-1-4939-3572-7_24.

引用本文的文献

DeepFold: enhancing protein structure prediction through optimized loss functions, improved template features, and re-optimized energy function.DeepFold：通过优化损失函数、改进模板特征和重新优化能量函数来增强蛋白质结构预测。

Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad712.

Protein Structure Refinement Using Multi-Objective Particle Swarm Optimization with Decomposition Strategy.使用分解策略的多目标粒子群优化进行蛋白质结构精修。

Int J Mol Sci. 2021 Apr 23;22(9):4408. doi: 10.3390/ijms22094408.

High-accuracy protein structures by combining machine-learning with physics-based refinement.通过将机器学习与基于物理的精修相结合，实现高精度的蛋白质结构预测。

Proteins. 2020 May;88(5):637-642. doi: 10.1002/prot.25847. Epub 2019 Nov 15.

Assessing the accuracy of contact predictions in CASP13.评估 CASP13 中接触预测的准确性。

Proteins. 2019 Dec;87(12):1058-1068. doi: 10.1002/prot.25819. Epub 2019 Oct 24.

Driven to near-experimental accuracy by refinement via molecular dynamics simulations.通过分子动力学模拟的细化，达到近乎实验的精度。

Proteins. 2019 Dec;87(12):1263-1275. doi: 10.1002/prot.25759. Epub 2019 Jun 24.

Combining Evolutionary Covariance and NMR Data for Protein Structure Determination.结合进化协方差和核磁共振数据用于蛋白质结构测定

Methods Enzymol. 2019;614:363-392. doi: 10.1016/bs.mie.2018.11.004. Epub 2018 Dec 23.

Combined approaches from physics, statistics, and computer science for protein structure prediction: (unity is strength)?物理、统计学和计算机科学相结合的蛋白质结构预测方法：（团结就是力量）？

F1000Res. 2018 Jul 24;7. doi: 10.12688/f1000research.14870.1. eCollection 2018.

ComplexContact: a web server for inter-protein contact prediction using deep learning.复杂接触：一个使用深度学习进行蛋白质间接触预测的网络服务器。

Nucleic Acids Res. 2018 Jul 2;46(W1):W432-W437. doi: 10.1093/nar/gky420.

本文引用的文献

Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.基于超深度学习模型的蛋白质接触图从头精确预测

PLoS Comput Biol. 2017 Jan 5;13(1):e1005324. doi: 10.1371/journal.pcbi.1005324. eCollection 2017 Jan.

ConEVA: a toolbox for comprehensive assessment of protein contacts.ConEVA：用于蛋白质接触全面评估的工具箱。

BMC Bioinformatics. 2016 Dec 7;17(1):517. doi: 10.1186/s12859-016-1404-z.

Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta.通过将协同进化信息整合到Rosetta中，改进了CASP11中的从头结构预测。

Proteins. 2016 Sep;84 Suppl 1(Suppl 1):67-75. doi: 10.1002/prot.24974. Epub 2016 Feb 24.

Evaluation of free modeling targets in CASP11 and ROLL.对蛋白质结构预测技术关键评估（CASP11）和蛋白质结构精修竞赛（ROLL）中自由建模目标的评估。

Proteins. 2016 Sep;84 Suppl 1(Suppl 1):51-66. doi: 10.1002/prot.24973. Epub 2016 Jan 20.

Integration of QUARK and I-TASSER for Ab Initio Protein Structure Prediction in CASP11.在蛋白质结构预测关键评估（CASP11）中用于从头蛋白质结构预测的QUARK和I-TASSER整合

Proteins. 2016 Sep;84 Suppl 1(Suppl 1):76-86. doi: 10.1002/prot.24930. Epub 2015 Sep 23.

Accurate contact predictions using covariation techniques and machine learning.使用共变技术和机器学习进行准确的接触预测。

Proteins. 2016 Sep;84 Suppl 1(Suppl Suppl 1):145-51. doi: 10.1002/prot.24863. Epub 2015 Aug 14.

CONFOLD: Residue-residue contact-guided ab initio protein folding.CONFOLD：基于残基-残基接触引导的从头算蛋白质折叠。

Proteins. 2015 Aug;83(8):1436-49. doi: 10.1002/prot.24829. Epub 2015 Jun 6.

MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins.MetaPSICOV：结合协同进化方法用于精确预测蛋白质中的接触和长程氢键

Bioinformatics. 2015 Apr 1;31(7):999-1006. doi: 10.1093/bioinformatics/btu791. Epub 2014 Nov 26.

Improved contact predictions using the recognition of protein like contact patterns.利用对蛋白质样接触模式的识别改进接触预测。

PLoS Comput Biol. 2014 Nov 6;10(11):e1003889. doi: 10.1371/journal.pcbi.1003889. eCollection 2014 Nov.

CCMpred--fast and precise prediction of protein residue-residue contacts from correlated mutations.CCMpred--快速准确地预测蛋白质残基-残基接触的相关突变。

Bioinformatics. 2014 Nov 1;30(21):3128-30. doi: 10.1093/bioinformatics/btu500. Epub 2014 Jul 26.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验