一种用于改进残基接触图预测的两阶段方法。

A two-stage approach for improved prediction of residue contact maps.

作者信息

Vullo Alessandro, Walsh Ian, Pollastri Gianluca

机构信息

School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland.

出版信息

BMC Bioinformatics. 2006 Mar 30;7:180. doi: 10.1186/1471-2105-7-180.

DOI:10.1186/1471-2105-7-180

PMID:16573808

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1484494/

Abstract

BACKGROUND

Protein topology representations such as residue contact maps are an important intermediate step towards ab initio prediction of protein structure. Although improvements have occurred over the last years, the problem of accurately predicting residue contact maps from primary sequences is still largely unsolved. Among the reasons for this are the unbalanced nature of the problem (with far fewer examples of contacts than non-contacts), the formidable challenge of capturing long-range interactions in the maps, the intrinsic difficulty of mapping one-dimensional input sequences into two-dimensional output maps. In order to alleviate these problems and achieve improved contact map predictions, in this paper we split the task into two stages: the prediction of a map's principal eigenvector (PE) from the primary sequence; the reconstruction of the contact map from the PE and primary sequence. Predicting the PE from the primary sequence consists in mapping a vector into a vector. This task is less complex than mapping vectors directly into two-dimensional matrices since the size of the problem is drastically reduced and so is the scale length of interactions that need to be learned.

RESULTS

We develop architectures composed of ensembles of two-layered bidirectional recurrent neural networks to classify the components of the PE in 2, 3 and 4 classes from protein primary sequence, predicted secondary structure, and hydrophobicity interaction scales. Our predictor, tested on a non redundant set of 2171 proteins, achieves classification performances of up to 72.6%, 16% above a base-line statistical predictor. We design a system for the prediction of contact maps from the predicted PE. Our results show that predicting maps through the PE yields sizeable gains especially for long-range contacts which are particularly critical for accurate protein 3D reconstruction. The final predictor's accuracy on a non-redundant set of 327 targets is 35.4% and 19.8% for minimum contact separations of 12 and 24, respectively, when the top length/5 contacts are selected. On the 11 CASP6 Novel Fold targets we achieve similar accuracies (36.5% and 19.7%). This favourably compares with the best automated predictors at CASP6.

CONCLUSION

Our final system for contact map prediction achieves state-of-the-art performances, and may provide valuable constraints for improved ab initio prediction of protein structures. A suite of predictors of structural features, including the PE, and PE-based contact maps, is available at http://distill.ucd.ie.

摘要

背景

诸如残基接触图之类的蛋白质拓扑表示是从头预测蛋白质结构的重要中间步骤。尽管在过去几年中有所改进，但从一级序列准确预测残基接触图的问题仍在很大程度上未得到解决。造成这种情况的原因包括问题的不平衡性质（接触的例子比非接触的例子少得多）、在图中捕捉长程相互作用的巨大挑战、将一维输入序列映射到二维输出图的内在困难。为了缓解这些问题并实现改进的接触图预测，在本文中我们将任务分为两个阶段：从一级序列预测图的主特征向量（PE）；从PE和一级序列重建接触图。从一级序列预测PE在于将一个向量映射到一个向量。此任务比直接将向量映射到二维矩阵的复杂度更低，因为问题的规模大幅减小，需要学习的相互作用的尺度长度也减小了。

结果

我们开发了由两层双向递归神经网络组成的集成架构，以根据蛋白质一级序列、预测的二级结构和疏水性相互作用尺度将PE的组成部分分类为2、3和4类。我们的预测器在一组2171个非冗余蛋白质上进行测试，实现了高达71.6%的分类性能，比基线统计预测器高出16%。我们设计了一个从预测的PE预测接触图的系统。我们的结果表明，通过PE预测图尤其对于长程接触有显著提升，而长程接触对于准确的蛋白质三维重建尤为关键。当选择前长度/5的接触时，最终预测器在一组327个非冗余目标上对于最小接触间距为12和24时的准确率分别为35.4%和19.8%。在11个CASP6新型折叠目标上我们实现了类似的准确率（36.5%和19.7%）。这与CASP6中最好的自动预测器相比具有优势。

结论

我们用于接触图预测的最终系统实现了最先进的性能，并可能为改进蛋白质结构的从头预测提供有价值的约束。一套包括PE和基于PE的接触图的结构特征预测器可在http://distill.ucd.ie上获取。

相似文献

A two-stage approach for improved prediction of residue contact maps.

BMC Bioinformatics. 2006 Mar 30;7:180. doi: 10.1186/1471-2105-7-180.

Ab initio and template-based prediction of multi-class distance maps by two-dimensional recursive neural networks.

BMC Struct Biol. 2009 Jan 30;9:5. doi: 10.1186/1472-6807-9-5.

Toward an accurate prediction of inter-residue distances in proteins using 2D recursive neural networks.

BMC Bioinformatics. 2014 Jan 10;15:6. doi: 10.1186/1471-2105-15-6.

Distill: a suite of web servers for the prediction of one-, two- and three-dimensional structural features of proteins.

BMC Bioinformatics. 2006 Sep 5;7:402. doi: 10.1186/1471-2105-7-402.

CNNcon: improved protein contact maps prediction using cascaded neural networks.

PLoS One. 2013 Apr 23;8(4):e61533. doi: 10.1371/journal.pone.0061533. Print 2013.

Ab initio and homology based prediction of protein domains by recursive neural networks.

BMC Bioinformatics. 2009 Jun 26;10:195. doi: 10.1186/1471-2105-10-195.

Long-range information and physicality constraints improve predicted protein contact maps.

J Bioinform Comput Biol. 2008 Oct;6(5):1001-20. doi: 10.1142/s0219720008003783.

Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks.

Bioinformatics. 2018 Dec 1;34(23):4039-4045. doi: 10.1093/bioinformatics/bty481.

Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.

PLoS Comput Biol. 2017 Jan 5;13(1):e1005324. doi: 10.1371/journal.pcbi.1005324. eCollection 2017 Jan.

Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners.

Bioinformatics. 2002;18 Suppl 1:S62-70. doi: 10.1093/bioinformatics/18.suppl_1.s62.

引用本文的文献

In Silico Protein Folding Prediction of COVID-19 Mutations and Variants.

Biomolecules. 2022 Nov 10;12(11):1665. doi: 10.3390/biom12111665.

Deep learning methods in protein structure prediction.

Comput Struct Biotechnol J. 2020 Jan 22;18:1301-1310. doi: 10.1016/j.csbj.2019.12.011. eCollection 2020.

Ordering Protein Contact Matrices.

Comput Struct Biotechnol J. 2018 Mar 16;16:140-156. doi: 10.1016/j.csbj.2018.03.001. eCollection 2018.

Protein Residue Contacts and Prediction Methods.

Methods Mol Biol. 2016;1415:463-76. doi: 10.1007/978-1-4939-3572-7_24.

Combining physicochemical and evolutionary information for protein contact prediction.

PLoS One. 2014 Oct 22;9(10):e108438. doi: 10.1371/journal.pone.0108438. eCollection 2014.

Reconstructing protein structures by neural network pairwise interaction fields and iterative decoy set construction.

Biomolecules. 2014 Feb 10;4(1):160-80. doi: 10.3390/biom4010160.

Toward an accurate prediction of inter-residue distances in proteins using 2D recursive neural networks.

BMC Bioinformatics. 2014 Jan 10;15:6. doi: 10.1186/1471-2105-15-6.

SCLpredT: Ab initio and homology-based prediction of subcellular localization by N-to-1 neural networks.

Springerplus. 2013 Oct 3;2:502. doi: 10.1186/2193-1801-2-502. eCollection 2013.

Predicting protein contact map using evolutionary and physical constraints by integer programming.

Bioinformatics. 2013 Jul 1;29(13):i266-73. doi: 10.1093/bioinformatics/btt211.

Evaluation of residue-residue contact prediction in CASP10.

Proteins. 2014 Feb;82 Suppl 2(0 2):138-53. doi: 10.1002/prot.24340. Epub 2013 Aug 31.

本文引用的文献

A general framework for adaptive processing of data structures.

IEEE Trans Neural Netw. 1998;9(5):768-86. doi: 10.1109/72.712151.

Recoverable one-dimensional encoding of three-dimensional protein structures.

Bioinformatics. 2005 May 15;21(10):2167-70. doi: 10.1093/bioinformatics/bti330. Epub 2005 Feb 18.

Porter: a new, accurate server for protein secondary structure prediction.

Bioinformatics. 2005 Apr 15;21(8):1719-20. doi: 10.1093/bioinformatics/bti203. Epub 2004 Dec 7.

Principal eigenvector of contact matrices and hydrophobicity profiles in proteins.

Proteins. 2005 Jan 1;58(1):22-30. doi: 10.1002/prot.20240.

Striped sheets and protein contact prediction.

Bioinformatics. 2004 Aug 4;20 Suppl 1:i224-31. doi: 10.1093/bioinformatics/bth913.

Reconstruction of protein structures from a vectorial representation.

Phys Rev Lett. 2004 May 28;92(21):218101. doi: 10.1103/PhysRevLett.92.218101.

Critical assessment of methods of protein structure prediction (CASP)-round V.

Proteins. 2003;53 Suppl 6:334-9. doi: 10.1002/prot.10556.

TOUCHSTONEX: protein structure prediction with sparse NMR data.

Proteins. 2003 Nov 1;53(2):290-306. doi: 10.1002/prot.10499.

De novo prediction of three-dimensional structures for major protein families.

J Mol Biol. 2002 Sep 6;322(1):65-78. doi: 10.1016/s0022-2836(02)00698-8.

Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners.

Bioinformatics. 2002;18 Suppl 1:S62-70. doi: 10.1093/bioinformatics/18.suppl_1.s62.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于改进残基接触图预测的两阶段方法。

A two-stage approach for improved prediction of residue contact maps.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献