Suppr超能文献

通过整合深度多序列比对、协同进化和机器学习进行蛋白质接触预测。

Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.

作者信息

Adhikari Badri, Hou Jie, Cheng Jianlin

机构信息

Department of Mathematics and Computer Science, University of Missouri-St. Louis, St. Louis, Missouri.

Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri.

出版信息

Proteins. 2018 Mar;86 Suppl 1(Suppl 1):84-96. doi: 10.1002/prot.25405. Epub 2017 Oct 31.

Abstract

In this study, we report the evaluation of the residue-residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution, and machine learning on contact prediction. The first method (MULTICOM-NOVEL) uses only traditional features (sequence profile, secondary structure, and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM-CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution-based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM-CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free-modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long-range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution-based features, and machine learning integration of coevolution-based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution-based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long-range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66.

摘要

在本研究中,我们报告了在蛋白质结构预测技术关键评估第12轮(CASP12)实验中,对我们三种不同方法预测的残基-残基接触的评估,重点是研究多序列比对、残基协同进化和机器学习对接触预测的影响。第一种方法(MULTICOM-NOVEL)仅使用传统特征(序列谱、二级结构和溶剂可及性)结合深度学习来预测接触,并作为基线。第二种方法(MULTICOM-CONSTRUCT)使用我们新的比对算法生成深度多序列比对,以得出基于协同进化的特征,这些特征通过神经网络方法进行整合以预测接触。第三种方法(MULTICOM-CLUSTER)是前两种方法预测结果的一致性组合。我们在94个CASP12结构域上评估了我们的方法。在38个自由建模结构域的子集上,对于前L/5个长程接触预测,我们的方法平均精度高达41.7%。三种方法的比较表明,多序列比对的质量和有效深度、基于协同进化的特征以及基于协同进化特征与传统特征的机器学习整合,推动了预测蛋白质接触的质量。在完整的CASP12数据集上,当评估前L/5个预测的长程接触时,仅基于协同进化的特征就能将平均精度从28.4%提高到41.6%,所有特征的机器学习整合进一步将精度提高到56.3%。并且接触预测精度与比对中有效序列数量的对数之间的相关性为0.66。

相似文献

6
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.基于超深度学习模型的蛋白质接触图从头精确预测
PLoS Comput Biol. 2017 Jan 5;13(1):e1005324. doi: 10.1371/journal.pcbi.1005324. eCollection 2017 Jan.
9
Accurate contact predictions using covariation techniques and machine learning.使用共变技术和机器学习进行准确的接触预测。
Proteins. 2016 Sep;84 Suppl 1(Suppl Suppl 1):145-51. doi: 10.1002/prot.24863. Epub 2015 Aug 14.
10
Protein Residue Contacts and Prediction Methods.蛋白质残基接触与预测方法
Methods Mol Biol. 2016;1415:463-76. doi: 10.1007/978-1-4939-3572-7_24.

引用本文的文献

4
Assessing the accuracy of contact predictions in CASP13.评估 CASP13 中接触预测的准确性。
Proteins. 2019 Dec;87(12):1058-1068. doi: 10.1002/prot.25819. Epub 2019 Oct 24.

本文引用的文献

1
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.基于超深度学习模型的蛋白质接触图从头精确预测
PLoS Comput Biol. 2017 Jan 5;13(1):e1005324. doi: 10.1371/journal.pcbi.1005324. eCollection 2017 Jan.
6
Accurate contact predictions using covariation techniques and machine learning.使用共变技术和机器学习进行准确的接触预测。
Proteins. 2016 Sep;84 Suppl 1(Suppl Suppl 1):145-51. doi: 10.1002/prot.24863. Epub 2015 Aug 14.
9
Improved contact predictions using the recognition of protein like contact patterns.利用对蛋白质样接触模式的识别改进接触预测。
PLoS Comput Biol. 2014 Nov 6;10(11):e1003889. doi: 10.1371/journal.pcbi.1003889. eCollection 2014 Nov.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验