通过序列和结构信息的共识组合器准确预测蛋白质二级结构和溶剂可及性。

Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information.

作者信息

Pollastri Gianluca, Martin Alberto J M, Mooney Catherine, Vullo Alessandro

机构信息

Complex and Adaptive Systems Laboratory, School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland.

出版信息

BMC Bioinformatics. 2007 Jun 14;8:201. doi: 10.1186/1471-2105-8-201.

DOI:10.1186/1471-2105-8-201

PMID:17570843

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1913928/

Abstract

BACKGROUND

Structural properties of proteins such as secondary structure and solvent accessibility contribute to three-dimensional structure prediction, not only in the ab initio case but also when homology information to known structures is available. Structural properties are also routinely used in protein analysis even when homology is available, largely because homology modelling is lower throughput than, say, secondary structure prediction. Nonetheless, predictors of secondary structure and solvent accessibility are virtually always ab initio.

RESULTS

Here we develop high-throughput machine learning systems for the prediction of protein secondary structure and solvent accessibility that exploit homology to proteins of known structure, where available, in the form of simple structural frequency profiles extracted from sets of PDB templates. We compare these systems to their state-of-the-art ab initio counterparts, and with a number of baselines in which secondary structures and solvent accessibilities are extracted directly from the templates. We show that structural information from templates greatly improves secondary structure and solvent accessibility prediction quality, and that, on average, the systems significantly enrich the information contained in the templates. For sequence similarity exceeding 30%, secondary structure prediction quality is approximately 90%, close to its theoretical maximum, and 2-class solvent accessibility roughly 85%. Gains are robust with respect to template selection noise, and significant for marginal sequence similarity and for short alignments, supporting the claim that these improved predictions may prove beneficial beyond the case in which clear homology is available.

CONCLUSION

The predictive system are publicly available at the address http://distill.ucd.ie.

摘要

背景

蛋白质的结构特性，如二级结构和溶剂可及性，不仅有助于从头预测三维结构，在有已知结构的同源信息时也有帮助。即使有同源性信息，结构特性在蛋白质分析中也经常被使用，很大程度上是因为同源建模的通量低于二级结构预测等方法。尽管如此，二级结构和溶剂可及性的预测器几乎总是从头开始的。

结果

在这里，我们开发了用于预测蛋白质二级结构和溶剂可及性的高通量机器学习系统，该系统利用与已知结构蛋白质的同源性（如果有的话），以从PDB模板集中提取的简单结构频率分布的形式。我们将这些系统与其从头开始的同类先进系统进行比较，并与一些直接从模板中提取二级结构和溶剂可及性的基线进行比较。我们表明，模板中的结构信息大大提高了二级结构和溶剂可及性的预测质量，并且平均而言，这些系统显著丰富了模板中包含的信息。对于序列相似性超过30%的情况，二级结构预测质量约为90%，接近其理论最大值，二分类溶剂可及性约为85%。对于模板选择噪声，增益是稳健的，对于边缘序列相似性和短比对也是显著的，这支持了这样的观点，即这些改进的预测可能在没有明显同源性的情况下也有益处。

结论

预测系统可在http://distill.ucd.ie上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/21d2/1913928/327c2de872f9/1471-2105-8-201-1.jpg

相似文献

Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information.通过序列和结构信息的共识组合器准确预测蛋白质二级结构和溶剂可及性。

BMC Bioinformatics. 2007 Jun 14;8:201. doi: 10.1186/1471-2105-8-201.

Ab initio and template-based prediction of multi-class distance maps by two-dimensional recursive neural networks.基于二维递归神经网络的多类别距离图的从头预测和基于模板的预测。

BMC Struct Biol. 2009 Jan 30;9:5. doi: 10.1186/1472-6807-9-5.

Beyond the Twilight Zone: automated prediction of structural properties of proteins by recursive neural networks and remote homology information.超越模糊地带：利用递归神经网络和远程同源信息自动预测蛋白质的结构特性

Proteins. 2009 Oct;77(1):181-90. doi: 10.1002/prot.22429.

Combining sequence and structural profiles for protein solvent accessibility prediction.结合序列和结构特征进行蛋白质溶剂可及性预测。

Comput Syst Bioinformatics Conf. 2008;7:195-202.

Ab initio and homology based prediction of protein domains by recursive neural networks.利用递归神经网络对蛋白质结构域进行从头预测和基于同源性的预测。

BMC Bioinformatics. 2009 Jun 26;10:195. doi: 10.1186/1471-2105-10-195.

Sequence based residue depth prediction using evolutionary information and predicted secondary structure.基于序列的残基深度预测，利用进化信息和预测的二级结构。

BMC Bioinformatics. 2008 Sep 20;9:388. doi: 10.1186/1471-2105-9-388.

Prediction of protein solvent accessibility using fuzzy k-nearest neighbor method.使用模糊k近邻法预测蛋白质溶剂可及性。

Bioinformatics. 2005 Jun 15;21(12):2844-9. doi: 10.1093/bioinformatics/bti423. Epub 2005 Apr 6.

Porter: a new, accurate server for protein secondary structure prediction.波特：一种用于蛋白质二级结构预测的新型精确服务器。

Bioinformatics. 2005 Apr 15;21(8):1719-20. doi: 10.1093/bioinformatics/bti203. Epub 2004 Dec 7.

Designing succinct structural alphabets.设计简洁的结构字母表。

Bioinformatics. 2008 Jul 1;24(13):i182-9. doi: 10.1093/bioinformatics/btn165.

A comprehensive assessment of sequence-based and template-based methods for protein contact prediction.基于序列和基于模板的蛋白质接触预测方法的综合评估。

Bioinformatics. 2008 Apr 1;24(7):924-31. doi: 10.1093/bioinformatics/btn069. Epub 2008 Feb 22.

引用本文的文献

DeepREx-WS: A web server for characterising protein-solvent interaction starting from sequence.DeepREx-WS：一个从序列开始表征蛋白质-溶剂相互作用的网络服务器。

Comput Struct Biotechnol J. 2021 Oct 13;19:5791-5799. doi: 10.1016/j.csbj.2021.10.016. eCollection 2021.

Predicting mucin-type O-Glycosylation using enhancement value products from derived protein features.利用衍生蛋白质特征的增强值产物预测粘蛋白型O-糖基化。

J Theor Comput Chem. 2020 May;19(3). doi: 10.1142/s0219633620400039. Epub 2020 Jun 15.

Deep learning methods in protein structure prediction.蛋白质结构预测中的深度学习方法。

Comput Struct Biotechnol J. 2020 Jan 22;18:1301-1310. doi: 10.1016/j.csbj.2019.12.011. eCollection 2020.

Multifaceted analysis of training and testing convolutional neural networks for protein secondary structure prediction.多方面分析用于蛋白质二级结构预测的卷积神经网络的训练和测试。

PLoS One. 2020 May 6;15(5):e0232528. doi: 10.1371/journal.pone.0232528. eCollection 2020.

Deeper Profiles and Cascaded Recurrent and Convolutional Neural Networks for state-of-the-art Protein Secondary Structure Prediction.用于蛋白质二级结构预测的深度剖面和级联递归与卷积神经网络。

Sci Rep. 2019 Aug 26;9(1):12374. doi: 10.1038/s41598-019-48786-x.

Accurate prediction of protein relative solvent accessibility using a balanced model.使用平衡模型准确预测蛋白质相对溶剂可及性。

BioData Min. 2017 Jan 24;10:1. doi: 10.1186/s13040-016-0121-5. eCollection 2017.

Sixty-five years of the long march in protein secondary structure prediction: the final stretch?蛋白质二级结构预测的长征：终章？

Brief Bioinform. 2018 May 1;19(3):482-494. doi: 10.1093/bib/bbw129.

Protein secondary structure prediction using a small training set (compact model) combined with a Complex-valued neural network approach.使用小型训练集（紧凑模型）结合复值神经网络方法进行蛋白质二级结构预测。

BMC Bioinformatics. 2016 Sep 13;17(1):362. doi: 10.1186/s12859-016-1209-0.

Bioinformatic Analysis of the Human Recombinant Iduronate 2-Sulfate Sulfatase.人重组艾杜糖醛酸2-硫酸酯酶的生物信息学分析

Open Microbiol J. 2016 May 31;10:124-32. doi: 10.2174/1874285801610010124. eCollection 2016.

Accurate Ab Initio and Template-Based Prediction of Short Intrinsically-Disordered Regions by Bidirectional Recurrent Neural Networks Trained on Large-Scale Datasets.基于大规模数据集训练的双向递归神经网络对短内在无序区域进行准确的从头预测和基于模板的预测。

Int J Mol Sci. 2015 Aug 21;16(8):19868-85. doi: 10.3390/ijms160819868.

本文引用的文献

Improving the accuracy of protein secondary structure prediction using structural alignment.利用结构比对提高蛋白质二级结构预测的准确性。

BMC Bioinformatics. 2006 Jun 14;7:301. doi: 10.1186/1471-2105-7-301.

A two-stage approach for improved prediction of residue contact maps.一种用于改进残基接触图预测的两阶段方法。

BMC Bioinformatics. 2006 Mar 30;7:180. doi: 10.1186/1471-2105-7-180.

A machine learning information retrieval approach to protein fold recognition.一种用于蛋白质折叠识别的机器学习信息检索方法。

Bioinformatics. 2006 Jun 15;22(12):1456-63. doi: 10.1093/bioinformatics/btl102. Epub 2006 Mar 17.

Critical assessment of methods of protein structure prediction (CASP)--round 6.蛋白质结构预测方法的批判性评估（CASP）——第六轮

Proteins. 2005;61 Suppl 7:3-7. doi: 10.1002/prot.20716.

Learning protein secondary structure from sequential and relational data.从序列和关系数据中学习蛋白质二级结构。

Neural Netw. 2005 Oct;18(8):1029-39. doi: 10.1016/j.neunet.2005.07.001. Epub 2005 Sep 22.

Predicting protein secondary structure and solvent accessibility with an improved multiple linear regression method.使用改进的多元线性回归方法预测蛋白质二级结构和溶剂可及性。

Proteins. 2005 Nov 15;61(3):473-80. doi: 10.1002/prot.20645.

Three-stage prediction of protein beta-sheets by neural networks, alignments and graph algorithms.利用神经网络、序列比对和图算法对蛋白质β折叠进行三阶段预测。

Bioinformatics. 2005 Jun;21 Suppl 1:i75-84. doi: 10.1093/bioinformatics/bti1004.

Linear regression models for solvent accessibility prediction in proteins.用于蛋白质溶剂可及性预测的线性回归模型。

J Comput Biol. 2005 Apr;12(3):355-69. doi: 10.1089/cmb.2005.12.355.

Prediction of protein solvent accessibility using fuzzy k-nearest neighbor method.使用模糊k近邻法预测蛋白质溶剂可及性。

Bioinformatics. 2005 Jun 15;21(12):2844-9. doi: 10.1093/bioinformatics/bti423. Epub 2005 Apr 6.

Prediction of protein relative solvent accessibility with a two-stage SVM approach.采用两阶段支持向量机方法预测蛋白质相对溶剂可及性。

Proteins. 2005 Apr 1;59(1):30-7. doi: 10.1002/prot.20404.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过序列和结构信息的共识组合器准确预测蛋白质二级结构和溶剂可及性。

Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献