基于序列特征预测蛋白质无规则区域的新方法。

A novel method of predicting protein disordered regions based on sequence features.

机构信息

Institute of Systems Biology, Shanghai University, Shanghai 200444, China.

出版信息

Biomed Res Int. 2013;2013:414327. doi: 10.1155/2013/414327. Epub 2013 Apr 22.

DOI:10.1155/2013/414327

PMID:23710446

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3654632/

Abstract

With a large number of disordered proteins and their important functions discovered, it is highly desired to develop effective methods to computationally predict protein disordered regions. In this study, based on Random Forest (RF), Maximum Relevancy Minimum Redundancy (mRMR), and Incremental Feature Selection (IFS), we developed a new method to predict disordered regions in proteins. The mRMR criterion was used to rank the importance of all candidate features. Finally, top 128 features were selected from the ranked feature list to build the optimal model, including 92 Position Specific Scoring Matrix (PSSM) conservation score features and 36 secondary structure features. As a result, Matthews correlation coefficient (MCC) of 0.3895 was achieved on the training set by 10-fold cross-validation. On the basis of predicting results for each query sequence by using the method, we used the scanning and modification strategy to improve the performance. The accuracy (ACC) and MCC were increased by 4% and almost 0.2%, respectively, compared with other three popular predictors: DISOPRED, DISOclust, and OnD-CRF. The selected features may shed some light on the understanding of the formation mechanism of disordered structures, providing guidelines for experimental validation.

摘要

随着大量无序蛋白质及其重要功能的发现，人们非常希望开发有效的方法来计算预测蛋白质无序区域。在这项研究中，我们基于随机森林 (RF)、最大相关性最小冗余度 (mRMR) 和增量特征选择 (IFS)，开发了一种新的预测蛋白质无序区域的方法。使用 mRMR 准则对所有候选特征的重要性进行排序。最后，从排序的特征列表中选择前 128 个特征来构建最优模型，包括 92 个位置特异性评分矩阵 (PSSM) 保守评分特征和 36 个二级结构特征。结果，通过 10 倍交叉验证，在训练集上获得了 0.3895 的马修斯相关系数 (MCC)。基于对每个查询序列的预测结果，我们使用扫描和修改策略来提高性能。与其他三个流行的预测器（DISOPRED、DISOclust 和 OnD-CRF）相比，准确性 (ACC) 和 MCC 分别提高了 4%和近 0.2%。选择的特征可能有助于理解无序结构的形成机制，为实验验证提供指导。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于序列特征预测蛋白质无规则区域的新方法。

A novel method of predicting protein disordered regions based on sequence features.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

基于序列特征预测蛋白质无规则区域的新方法。

A novel method of predicting protein disordered regions based on sequence features.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献