Suppr超能文献

一种基于分割的蛋白质折叠识别结构和进化特征提取方法。

A Segmentation-Based Method to Extract Structural and Evolutionary Features for Protein Fold Recognition.

作者信息

Dehzangi Abdollah, Paliwal Kuldip, Lyons James, Sharma Alok, Sattar Abdul

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2014 May-Jun;11(3):510-9. doi: 10.1109/TCBB.2013.2296317.

Abstract

Protein fold recognition (PFR) is considered as an important step towards the protein structure prediction problem. Despite all the efforts that have been made so far, finding an accurate and fast computational approach to solve the PFR still remains a challenging problem for bioinformatics and computational biology. In this study, we propose the concept of segmented-based feature extraction technique to provide local evolutionary information embedded in position specific scoring matrix (PSSM) and structural information embedded in the predicted secondary structure of proteins using SPINE-X. We also employ the concept of occurrence feature to extract global discriminatory information from PSSM and SPINE-X. By applying a support vector machine (SVM) to our extracted features, we enhance the protein fold prediction accuracy for 7.4 percent over the best results reported in the literature. We also report 73.8 percent prediction accuracy for a data set consisting of proteins with less than 25 percent sequence similarity rates and 80.7 percent prediction accuracy for a data set with proteins belonging to 110 folds with less than 40 percent sequence similarity rates. We also investigate the relation between the number of folds and the number of features being used and show that the number of features should be increased to get better protein fold prediction results when the number of folds is relatively large.

摘要

蛋白质折叠识别(PFR)被视为解决蛋白质结构预测问题的重要一步。尽管迄今为止已经付出了所有努力,但找到一种准确且快速的计算方法来解决PFR对于生物信息学和计算生物学来说仍然是一个具有挑战性的问题。在本研究中,我们提出了基于片段的特征提取技术的概念,以提供嵌入在位置特异性评分矩阵(PSSM)中的局部进化信息以及使用SPINE-X嵌入在蛋白质预测二级结构中的结构信息。我们还采用出现特征的概念从PSSM和SPINE-X中提取全局判别信息。通过将支持向量机(SVM)应用于我们提取的特征,我们将蛋白质折叠预测准确率比文献中报道的最佳结果提高了7.4%。对于由序列相似率低于25%的蛋白质组成的数据集,我们还报告了73.8%的预测准确率,对于由属于110个折叠且序列相似率低于40%的蛋白质组成的数据集,预测准确率为80.7%。我们还研究了折叠数量与所使用特征数量之间的关系,并表明当折叠数量相对较大时,应增加特征数量以获得更好的蛋白质折叠预测结果。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验