Suppr超能文献

基于考虑有序或无序倾向的精简位置特异性得分矩阵进行蛋白质无序预测。

Protein disorder prediction by condensed PSSM considering propensity for order or disorder.

作者信息

Su Chung-Tsai, Chen Chien-Yu, Ou Yu-Yen

机构信息

Department of Bio-industrial Mechatronics Engineering, National Taiwan University, Taipei, 106, Taiwan, ROC.

出版信息

BMC Bioinformatics. 2006 Jun 23;7:319. doi: 10.1186/1471-2105-7-319.

Abstract

BACKGROUND

More and more disordered regions have been discovered in protein sequences, and many of them are found to be functionally significant. Previous studies reveal that disordered regions of a protein can be predicted by its primary structure, the amino acid sequence. One observation that has been widely accepted is that ordered regions usually have compositional bias toward hydrophobic amino acids, and disordered regions are toward charged amino acids. Recent studies further show that employing evolutionary information such as position specific scoring matrices (PSSMs) improves the prediction accuracy of protein disorder. As more and more machine learning techniques have been introduced to protein disorder detection, extracting more useful features with biological insights attracts more attention.

RESULTS

This paper first studies the effect of a condensed position specific scoring matrix with respect to physicochemical properties (PSSMP) on the prediction accuracy, where the PSSMP is derived by merging several amino acid columns of a PSSM belonging to a certain property into a single column. Next, we decompose each conventional physicochemical property of amino acids into two disjoint groups which have a propensity for order and disorder respectively, and show by experiments that some of the new properties perform better than their parent properties in predicting protein disorder. In order to get an effective and compact feature set on this problem, we propose a hybrid feature selection method that inherits the efficiency of uni-variant analysis and the effectiveness of the stepwise feature selection that explores combinations of multiple features. The experimental results show that the selected feature set improves the performance of a classifier built with Radial Basis Function Networks (RBFN) in comparison with the feature set constructed with PSSMs or PSSMPs that adopt simply the conventional physicochemical properties.

CONCLUSION

Distinguishing disordered regions from ordered regions in protein sequences facilitates the exploration of protein structures and functions. Results based on independent testing data reveal that the proposed predicting model DisPSSMP performs the best among several of the existing packages doing similar tasks, without either under-predicting or over-predicting the disordered regions. Furthermore, the selected properties are demonstrated to be useful in finding discriminating patterns for order/disorder classification.

摘要

背景

在蛋白质序列中发现了越来越多的无序区域,其中许多被发现具有重要的功能。先前的研究表明,蛋白质的无序区域可以通过其一级结构,即氨基酸序列来预测。一个被广泛接受的观察结果是,有序区域通常在组成上偏向疏水氨基酸,而无序区域则偏向带电荷氨基酸。最近的研究进一步表明,利用诸如位置特异性得分矩阵(PSSM)等进化信息可以提高蛋白质无序预测的准确性。随着越来越多的机器学习技术被引入蛋白质无序检测,提取具有生物学见解的更有用特征引起了更多关注。

结果

本文首先研究了关于物理化学性质的压缩位置特异性得分矩阵(PSSMP)对预测准确性的影响,其中PSSMP是通过将属于某一性质的PSSM的几个氨基酸列合并为一列而得到的。接下来,我们将氨基酸的每个传统物理化学性质分解为两个不相交的组,它们分别具有有序和无序的倾向,并通过实验表明,一些新性质在预测蛋白质无序方面比其母体性质表现更好。为了在这个问题上获得一个有效且紧凑的特征集,我们提出了一种混合特征选择方法,该方法继承了单变量分析的效率和探索多个特征组合的逐步特征选择的有效性。实验结果表明,与仅采用传统物理化学性质的PSSM或PSSMP构建的特征集相比,所选特征集提高了使用径向基函数网络(RBFN)构建的分类器的性能。

结论

区分蛋白质序列中的无序区域和有序区域有助于探索蛋白质的结构和功能。基于独立测试数据的结果表明,所提出的预测模型DisPSSMP在几个执行类似任务的现有软件包中表现最佳,既没有对无序区域进行预测不足,也没有进行过度预测。此外,所选性质被证明在寻找有序/无序分类的判别模式方面是有用的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c6d0/1526762/dc0416548a3b/1471-2105-7-319-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验