Suppr超能文献

使用功能域和预测的二级结构序列准确预测蛋白质结构类别。

Accurate prediction of protein structural classes using functional domains and predicted secondary structure sequences.

机构信息

Department of Computer Science & Engineering, University of South Florida, Tampa, FL 33620, USA.

出版信息

J Biomol Struct Dyn. 2012;29(6):623-33. doi: 10.1080/07391102.2011.672626.

Abstract

Protein structural class prediction is one of the challenging problems in bioinformatics. Previous methods directly based on the similarity of amino acid (AA) sequences have been shown to be insufficient for low-similarity protein data-sets. To improve the prediction accuracy for such low-similarity proteins, different methods have been recently proposed that explore the novel feature sets based on predicted secondary structure propensities. In this paper, we focus on protein structural class prediction using combinations of the novel features including secondary structure propensities as well as functional domain (FD) features extracted from the InterPro signature database. Our comprehensive experimental results based on several benchmark data-sets have shown that the integration of new FD features substantially improves the accuracy of structural class prediction for low-similarity proteins as they capture meaningful relationships among AA residues that are far away in protein sequence. The proposed prediction method has also been tested to predict structural classes for partially disordered proteins with the reasonable prediction accuracy, which is a more difficult problem comparing to structural class prediction for commonly used benchmark data-sets and has never been done before to the best of our knowledge. In addition, to avoid overfitting with a large number of features, feature selection is applied to select discriminating features that contribute to achieve high prediction accuracy. The selected features have been shown to achieve stable prediction performance across different benchmark data-sets.

摘要

蛋白质结构类别预测是生物信息学中的一个具有挑战性的问题。以前的基于氨基酸(AA)序列相似性的直接方法已经被证明对于低相似度的蛋白质数据集是不够的。为了提高对这些低相似度蛋白质的预测准确性,最近提出了不同的方法,这些方法探索了基于预测二级结构倾向的新特征集。在本文中,我们专注于使用包括二级结构倾向以及从 InterPro 签名数据库中提取的功能域(FD)特征的组合来进行蛋白质结构类别预测。我们基于几个基准数据集的综合实验结果表明,新 FD 特征的集成极大地提高了低相似度蛋白质结构类别预测的准确性,因为它们捕获了蛋白质序列中相距很远的 AA 残基之间的有意义的关系。所提出的预测方法还针对部分无序蛋白质的结构类别预测进行了测试,具有合理的预测准确性,与常用基准数据集的结构类别预测相比,这是一个更困难的问题,据我们所知,以前从未做过。此外,为了避免由于特征数量过多而导致的过拟合,应用特征选择来选择有助于实现高预测准确性的判别特征。选择的特征已被证明在不同的基准数据集上具有稳定的预测性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验