仅根据序列将特定位置能量估计为蛋白质残基的特征用于结构分类。

Estimation of Position Specific Energy as a Feature of Protein Residues from Sequence Alone for Structural Classification.

作者信息

Iqbal Sumaiya, Hoque Md Tamjidul

机构信息

Department of Computer Science, University of New Orleans, New Orleans, LA, United States of America.

出版信息

PLoS One. 2016 Sep 2;11(9):e0161452. doi: 10.1371/journal.pone.0161452. eCollection 2016.

DOI:10.1371/journal.pone.0161452

PMID:27588752

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5010294/

Abstract

A set of features computed from the primary amino acid sequence of proteins, is crucial in the process of inducing a machine learning model that is capable of accurately predicting three-dimensional protein structures. Solutions for existing protein structure prediction problems are in need of features that can capture the complexity of molecular level interactions. With a view to this, we propose a novel approach to estimate position specific estimated energy (PSEE) of a residue using contact energy and predicted relative solvent accessibility (RSA). Furthermore, we demonstrate PSEE can be reasonably estimated based on sequence information alone. PSEE is useful in identifying the structured as well as unstructured or, intrinsically disordered region of a protein by computing favorable and unfavorable energy respectively, characterized by appropriate threshold. The most intriguing finding, verified empirically, is the indication that the PSEE feature can effectively classify disorder versus ordered residues and can segregate different secondary structure type residues by computing the constituent energies. PSEE values for each amino acid strongly correlate with the hydrophobicity value of the corresponding amino acid. Further, PSEE can be used to detect the existence of critical binding regions that essentially undergo disorder-to-order transitions to perform crucial biological functions. Towards an application of disorder prediction using the PSEE feature, we have rigorously tested and found that a support vector machine model informed by a set of features including PSEE consistently outperforms a model with an identical set of features with PSEE removed. In addition, the new disorder predictor, DisPredict2, shows competitive performance in predicting protein disorder when compared with six existing disordered protein predictors.

摘要

从蛋白质的一级氨基酸序列计算得出的一组特征，对于诱导一个能够准确预测三维蛋白质结构的机器学习模型的过程至关重要。现有的蛋白质结构预测问题的解决方案需要能够捕捉分子水平相互作用复杂性的特征。鉴于此，我们提出了一种新颖的方法，利用接触能量和预测的相对溶剂可及性（RSA）来估计残基的位置特异性估计能量（PSEE）。此外，我们证明仅基于序列信息就可以合理估计PSEE。通过分别计算有利和不利能量，并以适当的阈值为特征，PSEE可用于识别蛋白质的结构化区域以及非结构化或内在无序区域。通过实验验证的最有趣的发现是，PSEE特征可以通过计算组成能量有效地对无序残基和有序残基进行分类，并可以区分不同二级结构类型的残基。每个氨基酸的PSEE值与相应氨基酸的疏水性值密切相关。此外，PSEE可用于检测关键结合区域的存在，这些区域基本上会经历从无序到有序的转变以执行关键的生物学功能。为了应用PSEE特征进行无序预测，我们进行了严格测试，发现由包括PSEE在内的一组特征提供信息的支持向量机模型始终优于去除PSEE的相同特征集的模型。此外，新的无序预测器DisPredict2与六个现有的无序蛋白质预测器相比，在预测蛋白质无序方面表现出具有竞争力的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b39/5010294/f6361353de48/pone.0161452.g001.jpg

相似文献

Estimation of Position Specific Energy as a Feature of Protein Residues from Sequence Alone for Structural Classification.仅根据序列将特定位置能量估计为蛋白质残基的特征用于结构分类。

PLoS One. 2016 Sep 2;11(9):e0161452. doi: 10.1371/journal.pone.0161452. eCollection 2016.

Predicting residue-wise contact orders in proteins by support vector regression.通过支持向量回归预测蛋白质中残基水平的接触序。

BMC Bioinformatics. 2006 Oct 3;7:425. doi: 10.1186/1471-2105-7-425.

Protein disorder prediction by condensed PSSM considering propensity for order or disorder.基于考虑有序或无序倾向的精简位置特异性得分矩阵进行蛋白质无序预测。

BMC Bioinformatics. 2006 Jun 23;7:319. doi: 10.1186/1471-2105-7-319.

Improved protein relative solvent accessibility prediction using deep multi-view feature learning framework.利用深度多视图特征学习框架提高蛋白质相对溶剂可及性预测。

Anal Biochem. 2021 Oct 15;631:114358. doi: 10.1016/j.ab.2021.114358. Epub 2021 Aug 31.

APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility.APIS：通过结合突出指数和溶剂可及性来准确预测蛋白质界面热点。

BMC Bioinformatics. 2010 Apr 8;11:174. doi: 10.1186/1471-2105-11-174.

Proteus: a random forest classifier to predict disorder-to-order transitioning binding regions in intrinsically disordered proteins.Proteus：一种用于预测内在无序蛋白质中无序到有序转变结合区域的随机森林分类器。

J Comput Aided Mol Des. 2017 May;31(5):453-466. doi: 10.1007/s10822-017-0020-y. Epub 2017 Apr 1.

Intrinsic disorder in the Protein Data Bank.蛋白质数据库中的内在无序状态。

J Biomol Struct Dyn. 2007 Feb;24(4):325-42. doi: 10.1080/07391102.2007.10507123.

A two-stage approach for improved prediction of residue contact maps.一种用于改进残基接触图预测的两阶段方法。

BMC Bioinformatics. 2006 Mar 30;7:180. doi: 10.1186/1471-2105-7-180.

Prediction of the Disordered Regions of Intrinsically Disordered Proteins Based on the Molecular Functions.基于分子功能预测内在无序蛋白质的无序区域

Protein Pept Lett. 2020;27(4):279-286. doi: 10.2174/0929866526666190226160629.

RBSURFpred: Modeling protein accessible surface area in real and binary space using regularized and optimized regression.RBSURFpred：使用正则化和优化回归在实数空间和二元空间中对蛋白质可及表面积进行建模。

J Theor Biol. 2018 Mar 14;441:44-57. doi: 10.1016/j.jtbi.2017.12.029. Epub 2018 Jan 2.

引用本文的文献

Assessment of Disordered Linker Predictions in the CAID2 Experiment.CAID2 实验中无序连接预测的评估。

Biomolecules. 2024 Feb 28;14(3):287. doi: 10.3390/biom14030287.

DRBpred: A sequence-based machine learning method to effectively predict DNA- and RNA-binding residues.DRBpred：一种基于序列的机器学习方法，可有效预测 DNA 和 RNA 结合残基。

Comput Biol Med. 2024 Mar;170:108081. doi: 10.1016/j.compbiomed.2024.108081. Epub 2024 Jan 29.

TAFPred: Torsion Angle Fluctuations Prediction from Protein Sequences.TAFPred：基于蛋白质序列的扭转角波动预测

Biology (Basel). 2023 Jul 19;12(7):1020. doi: 10.3390/biology12071020.

CAID prediction portal: a comprehensive service for predicting intrinsic disorder and binding regions in proteins.CAID 预测门户：一个用于预测蛋白质中内源性无序区域和结合区域的综合服务。

Nucleic Acids Res. 2023 Jul 5;51(W1):W62-W69. doi: 10.1093/nar/gkad430.

HyperCys: A Structure- and Sequence-Based Predictor of Hyper-Reactive Druggable Cysteines.HyperCys：基于结构和序列的高反应性可成药半胱氨酸预测器。

Int J Mol Sci. 2023 Mar 22;24(6):5960. doi: 10.3390/ijms24065960.

Predicting Protein Conformational Disorder and Disordered Binding Sites.预测蛋白质构象紊乱和无序结合位点。

Methods Mol Biol. 2022;2449:95-147. doi: 10.1007/978-1-0716-2095-3_4.

Prediction of protein disorder based on IUPred.基于IUPred的蛋白质无序预测。

Protein Sci. 2018 Jan;27(1):331-340. doi: 10.1002/pro.3334. Epub 2017 Nov 16.

本文引用的文献

MFDp2: Accurate predictor of disorder in proteins by fusion of disorder probabilities, content and profiles.MFDp2：通过融合无序概率、含量和图谱实现蛋白质无序的精确预测器。

Intrinsically Disord Proteins. 2013 Apr 1;1(1):e24428. doi: 10.4161/idp.24428. eCollection 2013 Jan-Dec.

sDFIRE: Sequence-specific statistical energy function for protein structure prediction by decoy selections.sDFIRE：用于通过诱饵选择进行蛋白质结构预测的序列特异性统计能量函数。

J Comput Chem. 2016 May 5;37(12):1119-24. doi: 10.1002/jcc.24298. Epub 2016 Feb 5.

A balanced secondary structure predictor.一种平衡二级结构预测器。

J Theor Biol. 2016 Jan 21;389:60-71. doi: 10.1016/j.jtbi.2015.10.015. Epub 2015 Nov 5.

DisPredict: A Predictor of Disordered Protein Using Optimized RBF Kernel.DisPredict：一种使用优化径向基函数核的无序蛋白质预测器。

PLoS One. 2015 Oct 30;10(10):e0141551. doi: 10.1371/journal.pone.0141551. eCollection 2015.

Improved prediction of accessible surface area results in efficient energy function application.可及表面积预测的改进带来了高效的能量函数应用。

J Theor Biol. 2015 Sep 7;380:380-91. doi: 10.1016/j.jtbi.2015.06.012. Epub 2015 Jun 17.

DISOPRED3: precise disordered region predictions with annotated protein-binding activity.DISOPRED3：具有注释蛋白质结合活性的精确无序区域预测

Bioinformatics. 2015 Mar 15;31(6):857-63. doi: 10.1093/bioinformatics/btu744. Epub 2014 Nov 12.

Maximum allowed solvent accessibilites of residues in proteins.蛋白质中残基的最大允许溶剂可及性。

PLoS One. 2013 Nov 21;8(11):e80635. doi: 10.1371/journal.pone.0080635. eCollection 2013.

Assessment of protein disorder region predictions in CASP10.CASP10中蛋白质无序区域预测的评估

Proteins. 2014 Feb;82 Suppl 2(0 2):127-37. doi: 10.1002/prot.24391. Epub 2013 Nov 22.

DNdisorder: predicting protein disorder using boosting and deep networks.DNdisorder：使用提升和深度网络预测蛋白质无序性。

BMC Bioinformatics. 2013 Mar 6;14:88. doi: 10.1186/1471-2105-14-88.

A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition.基于位置得分矩阵双元概率的蛋白质结构识别特征提取技术。

J Theor Biol. 2013 Mar 7;320:41-6. doi: 10.1016/j.jtbi.2012.12.008. Epub 2012 Dec 13.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

仅根据序列将特定位置能量估计为蛋白质残基的特征用于结构分类。

Estimation of Position Specific Energy as a Feature of Protein Residues from Sequence Alone for Structural Classification.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献