基于考虑有序或无序倾向的精简位置特异性得分矩阵进行蛋白质无序预测。

Protein disorder prediction by condensed PSSM considering propensity for order or disorder.

作者信息

Su Chung-Tsai, Chen Chien-Yu, Ou Yu-Yen

机构信息

Department of Bio-industrial Mechatronics Engineering, National Taiwan University, Taipei, 106, Taiwan, ROC.

出版信息

BMC Bioinformatics. 2006 Jun 23;7:319. doi: 10.1186/1471-2105-7-319.

DOI:10.1186/1471-2105-7-319

PMID:16796745

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1526762/

Abstract

BACKGROUND

More and more disordered regions have been discovered in protein sequences, and many of them are found to be functionally significant. Previous studies reveal that disordered regions of a protein can be predicted by its primary structure, the amino acid sequence. One observation that has been widely accepted is that ordered regions usually have compositional bias toward hydrophobic amino acids, and disordered regions are toward charged amino acids. Recent studies further show that employing evolutionary information such as position specific scoring matrices (PSSMs) improves the prediction accuracy of protein disorder. As more and more machine learning techniques have been introduced to protein disorder detection, extracting more useful features with biological insights attracts more attention.

RESULTS

This paper first studies the effect of a condensed position specific scoring matrix with respect to physicochemical properties (PSSMP) on the prediction accuracy, where the PSSMP is derived by merging several amino acid columns of a PSSM belonging to a certain property into a single column. Next, we decompose each conventional physicochemical property of amino acids into two disjoint groups which have a propensity for order and disorder respectively, and show by experiments that some of the new properties perform better than their parent properties in predicting protein disorder. In order to get an effective and compact feature set on this problem, we propose a hybrid feature selection method that inherits the efficiency of uni-variant analysis and the effectiveness of the stepwise feature selection that explores combinations of multiple features. The experimental results show that the selected feature set improves the performance of a classifier built with Radial Basis Function Networks (RBFN) in comparison with the feature set constructed with PSSMs or PSSMPs that adopt simply the conventional physicochemical properties.

CONCLUSION

Distinguishing disordered regions from ordered regions in protein sequences facilitates the exploration of protein structures and functions. Results based on independent testing data reveal that the proposed predicting model DisPSSMP performs the best among several of the existing packages doing similar tasks, without either under-predicting or over-predicting the disordered regions. Furthermore, the selected properties are demonstrated to be useful in finding discriminating patterns for order/disorder classification.

摘要

背景

在蛋白质序列中发现了越来越多的无序区域，其中许多被发现具有重要的功能。先前的研究表明，蛋白质的无序区域可以通过其一级结构，即氨基酸序列来预测。一个被广泛接受的观察结果是，有序区域通常在组成上偏向疏水氨基酸，而无序区域则偏向带电荷氨基酸。最近的研究进一步表明，利用诸如位置特异性得分矩阵（PSSM）等进化信息可以提高蛋白质无序预测的准确性。随着越来越多的机器学习技术被引入蛋白质无序检测，提取具有生物学见解的更有用特征引起了更多关注。

结果

本文首先研究了关于物理化学性质的压缩位置特异性得分矩阵（PSSMP）对预测准确性的影响，其中PSSMP是通过将属于某一性质的PSSM的几个氨基酸列合并为一列而得到的。接下来，我们将氨基酸的每个传统物理化学性质分解为两个不相交的组，它们分别具有有序和无序的倾向，并通过实验表明，一些新性质在预测蛋白质无序方面比其母体性质表现更好。为了在这个问题上获得一个有效且紧凑的特征集，我们提出了一种混合特征选择方法，该方法继承了单变量分析的效率和探索多个特征组合的逐步特征选择的有效性。实验结果表明，与仅采用传统物理化学性质的PSSM或PSSMP构建的特征集相比，所选特征集提高了使用径向基函数网络（RBFN）构建的分类器的性能。

结论

区分蛋白质序列中的无序区域和有序区域有助于探索蛋白质的结构和功能。基于独立测试数据的结果表明，所提出的预测模型DisPSSMP在几个执行类似任务的现有软件包中表现最佳，既没有对无序区域进行预测不足，也没有进行过度预测。此外，所选性质被证明在寻找有序/无序分类的判别模式方面是有用的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c6d0/1526762/dc0416548a3b/1471-2105-7-319-1.jpg

相似文献

Protein disorder prediction by condensed PSSM considering propensity for order or disorder.基于考虑有序或无序倾向的精简位置特异性得分矩阵进行蛋白质无序预测。

BMC Bioinformatics. 2006 Jun 23;7:319. doi: 10.1186/1471-2105-7-319.

Real value prediction of protein solvent accessibility using enhanced PSSM features.使用增强的位置特异性得分矩阵（PSSM）特征对蛋白质溶剂可及性进行实际值预测。

BMC Bioinformatics. 2008 Dec 12;9 Suppl 12(Suppl 12):S12. doi: 10.1186/1471-2105-9-S12-S12.

PSSM-based prediction of DNA binding sites in proteins.基于位置特异性得分矩阵的蛋白质中DNA结合位点预测

BMC Bioinformatics. 2005 Feb 19;6:33. doi: 10.1186/1471-2105-6-33.

iPDA: integrated protein disorder analyzer.iPDA：整合蛋白质无序分析器。

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W465-72. doi: 10.1093/nar/gkm353. Epub 2007 Jun 6.

Length-dependent prediction of protein intrinsic disorder.蛋白质内在无序性的长度依赖性预测。

BMC Bioinformatics. 2006 Apr 17;7:208. doi: 10.1186/1471-2105-7-208.

Predicting and analyzing DNA-binding domains using a systematic approach to identifying a set of informative physicochemical and biochemical properties.使用系统方法预测和分析 DNA 结合域，以确定一组有意义的物理化学和生化特性。

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S47. doi: 10.1186/1471-2105-12-S1-S47.

FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues.FastRNABindR：蛋白质-RNA 界面残基的快速准确预测

PLoS One. 2016 Jul 6;11(7):e0158445. doi: 10.1371/journal.pone.0158445. eCollection 2016.

Extracting physicochemical features to predict protein secondary structure.提取物理化学特征以预测蛋白质二级结构。

ScientificWorldJournal. 2013 May 14;2013:347106. doi: 10.1155/2013/347106. Print 2013.

Investigation and identification of protein carbonylation sites based on position-specific amino acid composition and physicochemical features.基于位点特异性氨基酸组成和理化特性的蛋白质羰基化位点的研究与鉴定

BMC Bioinformatics. 2017 Mar 14;18(Suppl 3):66. doi: 10.1186/s12859-017-1472-8.

MFSPSSMpred: identifying short disorder-to-order binding regions in disordered proteins based on contextual local evolutionary conservation.MFSPSSMpred：基于上下文局部进化保守性识别无序蛋白中的短无序到有序结合区域。

BMC Bioinformatics. 2013 Oct 4;14:300. doi: 10.1186/1471-2105-14-300.

引用本文的文献

Computational Prediction of Intrinsically Disordered Proteins Based on Protein Sequences and Convolutional Neural Networks.基于蛋白质序列和卷积神经网络的蛋白质无规则卷曲预测。

Comput Intell Neurosci. 2021 Dec 28;2021:4455604. doi: 10.1155/2021/4455604. eCollection 2021.

ET-GRU: using multi-layer gated recurrent units to identify electron transport proteins.ET-GRU：利用多层门控循环单元识别电子传输蛋白。

BMC Bioinformatics. 2019 Jul 6;20(1):377. doi: 10.1186/s12859-019-2972-5.

MFDp2: Accurate predictor of disorder in proteins by fusion of disorder probabilities, content and profiles.MFDp2：通过融合无序概率、含量和图谱实现蛋白质无序的精确预测器。

Intrinsically Disord Proteins. 2013 Apr 1;1(1):e24428. doi: 10.4161/idp.24428. eCollection 2013 Jan-Dec.

Incorporating efficient radial basis function networks and significant amino acid pairs for predicting GTP binding sites in transport proteins.结合高效径向基函数网络和重要氨基酸对来预测转运蛋白中的GTP结合位点。

BMC Bioinformatics. 2016 Dec 22;17(Suppl 19):501. doi: 10.1186/s12859-016-1369-y.

Prediction of FAD binding sites in electron transport proteins according to efficient radial basis function networks and significant amino acid pairs.基于高效径向基函数网络和重要氨基酸对预测电子传递蛋白中的FAD结合位点。

BMC Bioinformatics. 2016 Jul 30;17:298. doi: 10.1186/s12859-016-1163-x.

Survey of Natural Language Processing Techniques in Bioinformatics.生物信息学中的自然语言处理技术综述

Comput Math Methods Med. 2015;2015:674296. doi: 10.1155/2015/674296. Epub 2015 Oct 7.

DisPredict: A Predictor of Disordered Protein Using Optimized RBF Kernel.DisPredict：一种使用优化径向基函数核的无序蛋白质预测器。

PLoS One. 2015 Oct 30;10(10):e0141551. doi: 10.1371/journal.pone.0141551. eCollection 2015.

An Overview of Predictors for Intrinsically Disordered Proteins over 2010-2014.2010 - 2014年内在无序蛋白质预测因子概述。

Int J Mol Sci. 2015 Sep 29;16(10):23446-62. doi: 10.3390/ijms161023446.

Brain expressed and X-linked (Bex) proteins are intrinsically disordered proteins (IDPs) and form new signaling hubs.脑表达且X连锁（Bex）蛋白是内在无序蛋白（IDP），并形成新的信号枢纽。

PLoS One. 2015 Jan 22;10(1):e0117206. doi: 10.1371/journal.pone.0117206. eCollection 2015.

Recombination of strain O segments to HCpro-encoding sequence of strain N of Potato virus Y modulates necrosis induced in tobacco and in potatoes carrying resistance genes Ny or Nc.马铃薯Y病毒O株系的片段与N株系的HCpro编码序列重组，可调节在携带抗性基因Ny或Nc的烟草和马铃薯中诱导的坏死。

Mol Plant Pathol. 2015 Sep;16(7):735-47. doi: 10.1111/mpp.12231. Epub 2015 Jan 29.

本文引用的文献

Assessment of disorder predictions in CASP6.CASP6中无序预测的评估。

Proteins. 2005;61 Suppl 7:167-175. doi: 10.1002/prot.20734.

FoldIndex: a simple tool to predict whether a given protein sequence is intrinsically unfolded.折叠指数：一种预测给定蛋白质序列是否为内在无序的简单工具。

Bioinformatics. 2005 Aug 15;21(16):3435-8. doi: 10.1093/bioinformatics/bti537. Epub 2005 Jun 14.

IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content.IUPred：基于估计能量含量预测蛋白质内在无序区域的网络服务器。

Bioinformatics. 2005 Aug 15;21(16):3433-4. doi: 10.1093/bioinformatics/bti541. Epub 2005 Jun 14.

RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins.RONN：应用于检测蛋白质天然无序区域的生物基础功能神经网络技术。

Bioinformatics. 2005 Aug 15;21(16):3369-76. doi: 10.1093/bioinformatics/bti534. Epub 2005 Jun 9.

Improved method for predicting beta-turn using support vector machine.使用支持向量机预测β-转角的改进方法。

Bioinformatics. 2005 May 15;21(10):2370-4. doi: 10.1093/bioinformatics/bti358. Epub 2005 Mar 29.

The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins.根据氨基酸组成估算的成对能量含量可区分折叠蛋白和内在无序蛋白。

J Mol Biol. 2005 Apr 8;347(4):827-39. doi: 10.1016/j.jmb.2005.01.071.

Optimizing long intrinsic disorder predictors with protein evolutionary information.利用蛋白质进化信息优化长内在无序预测器。

J Bioinform Comput Biol. 2005 Feb;3(1):35-60. doi: 10.1142/s0219720005000886.

Natively unfolded proteins.天然未折叠蛋白

Curr Opin Struct Biol. 2005 Feb;15(1):35-41. doi: 10.1016/j.sbi.2005.01.002.

Prediction of unfolded segments in a protein sequence based on amino acid composition.基于氨基酸组成预测蛋白质序列中的未折叠片段。

Bioinformatics. 2005 May 1;21(9):1891-900. doi: 10.1093/bioinformatics/bti266. Epub 2005 Jan 18.

Sequence patterns associated with disordered regions in proteins.与蛋白质无序区域相关的序列模式。

Proteins. 2005 Jan 1;58(1):144-50. doi: 10.1002/prot.20279.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于考虑有序或无序倾向的精简位置特异性得分矩阵进行蛋白质无序预测。

Protein disorder prediction by condensed PSSM considering propensity for order or disorder.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献