Suppr超能文献

用于提高蛋白质结构预测准确性的特征选择与组合标准。

Feature selection and combination criteria for improving accuracy in protein structure prediction.

作者信息

Lin Ken-Li, Lin Chun-Yuan, Huang Chuen-Der, Chang Hsiu-Ming, Yang Chiao-Yun, Lin Chin-Teng, Tang Chuan Yi, Hsu D Frank

机构信息

Department of Electrical and Control Engineering, National Chiao-Tung University, Hsin-chu, Taiwan and Computer Center of Chung Hua University, Hsin-chu, Taiwan.

出版信息

IEEE Trans Nanobioscience. 2007 Jun;6(2):186-96. doi: 10.1109/tnb.2007.897482.

Abstract

The classification of protein structures is essential for their function determination in bioinformatics. At present, a reasonably high rate of prediction accuracy has been achieved in classifying proteins into four classes in the SCOP database according to their primary amino acid sequences. However, for further classification into fine-grained folding categories, especially when the number of possible folding patterns as those defined in the SCOP database is large, it is still quite a challenge. In our previous work, we have proposed a two-level classification strategy called hierarchical learning architecture (HLA) using neural networks and two indirect coding features to differentiate proteins according to their classes and folding patterns, which achieved an accuracy rate of 65.5%. In this paper, we use a combinatorial fusion technique to facilitate feature selection and combination for improving predictive accuracy in protein structure classification. When applying various criteria in combinatorial fusion to the protein fold prediction approach using neural networks with HLA and the radial basis function network (RBFN), the resulting classification has an overall prediction accuracy rate of 87% for four classes and 69.6% for 27 folding categories. These rates are significantly higher than the accuracy rate of 56.5% previously obtained by Ding and Dubchak. Our results demonstrate that data fusion is a viable method for feature selection and combination in the prediction and classification of protein structure.

摘要

在生物信息学中,蛋白质结构分类对于确定其功能至关重要。目前,根据蛋白质的一级氨基酸序列将其在SCOP数据库中分为四类,已实现了相当高的预测准确率。然而,对于进一步细分为精细的折叠类别,特别是当SCOP数据库中定义的可能折叠模式数量很大时,仍然是一个相当大的挑战。在我们之前的工作中,我们提出了一种称为层次学习架构(HLA)的两级分类策略,使用神经网络和两种间接编码特征根据蛋白质的类别和折叠模式对其进行区分,准确率达到了65.5%。在本文中,我们使用组合融合技术来促进特征选择和组合,以提高蛋白质结构分类的预测准确率。当将组合融合中的各种标准应用于使用具有HLA的神经网络和径向基函数网络(RBFN)的蛋白质折叠预测方法时,对于四类的总体预测准确率为87%,对于27个折叠类别的预测准确率为69.6%。这些准确率显著高于丁和杜布恰克之前获得的56.5%的准确率。我们的结果表明,数据融合是蛋白质结构预测和分类中特征选择和组合的一种可行方法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验