用于提高蛋白质结构预测准确性的特征选择与组合标准。

Feature selection and combination criteria for improving accuracy in protein structure prediction.

作者信息

Lin Ken-Li, Lin Chun-Yuan, Huang Chuen-Der, Chang Hsiu-Ming, Yang Chiao-Yun, Lin Chin-Teng, Tang Chuan Yi, Hsu D Frank

机构信息

Department of Electrical and Control Engineering, National Chiao-Tung University, Hsin-chu, Taiwan and Computer Center of Chung Hua University, Hsin-chu, Taiwan.

出版信息

IEEE Trans Nanobioscience. 2007 Jun;6(2):186-96. doi: 10.1109/tnb.2007.897482.

DOI:10.1109/tnb.2007.897482

PMID:17695755

Abstract

The classification of protein structures is essential for their function determination in bioinformatics. At present, a reasonably high rate of prediction accuracy has been achieved in classifying proteins into four classes in the SCOP database according to their primary amino acid sequences. However, for further classification into fine-grained folding categories, especially when the number of possible folding patterns as those defined in the SCOP database is large, it is still quite a challenge. In our previous work, we have proposed a two-level classification strategy called hierarchical learning architecture (HLA) using neural networks and two indirect coding features to differentiate proteins according to their classes and folding patterns, which achieved an accuracy rate of 65.5%. In this paper, we use a combinatorial fusion technique to facilitate feature selection and combination for improving predictive accuracy in protein structure classification. When applying various criteria in combinatorial fusion to the protein fold prediction approach using neural networks with HLA and the radial basis function network (RBFN), the resulting classification has an overall prediction accuracy rate of 87% for four classes and 69.6% for 27 folding categories. These rates are significantly higher than the accuracy rate of 56.5% previously obtained by Ding and Dubchak. Our results demonstrate that data fusion is a viable method for feature selection and combination in the prediction and classification of protein structure.

摘要

在生物信息学中，蛋白质结构分类对于确定其功能至关重要。目前，根据蛋白质的一级氨基酸序列将其在SCOP数据库中分为四类，已实现了相当高的预测准确率。然而，对于进一步细分为精细的折叠类别，特别是当SCOP数据库中定义的可能折叠模式数量很大时，仍然是一个相当大的挑战。在我们之前的工作中，我们提出了一种称为层次学习架构（HLA）的两级分类策略，使用神经网络和两种间接编码特征根据蛋白质的类别和折叠模式对其进行区分，准确率达到了65.5%。在本文中，我们使用组合融合技术来促进特征选择和组合，以提高蛋白质结构分类的预测准确率。当将组合融合中的各种标准应用于使用具有HLA的神经网络和径向基函数网络（RBFN）的蛋白质折叠预测方法时，对于四类的总体预测准确率为87%，对于27个折叠类别的预测准确率为69.6%。这些准确率显著高于丁和杜布恰克之前获得的56.5%的准确率。我们的结果表明，数据融合是蛋白质结构预测和分类中特征选择和组合的一种可行方法。

相似文献

Feature selection and combination criteria for improving accuracy in protein structure prediction.用于提高蛋白质结构预测准确性的特征选择与组合标准。

IEEE Trans Nanobioscience. 2007 Jun;6(2):186-96. doi: 10.1109/tnb.2007.897482.

Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification.用于多类蛋白质折叠分类的具有自动特征选择的分层学习架构。

IEEE Trans Nanobioscience. 2003 Dec;2(4):221-32. doi: 10.1109/tnb.2003.820284.

Prediction of protein structure classes with flexible neural tree.使用灵活神经树预测蛋白质结构类别。

Biomed Mater Eng. 2014;24(6):3797-806. doi: 10.3233/BME-141209.

Decision tree based information integration for automated protein classification.基于决策树的信息整合用于蛋白质自动分类

J Bioinform Comput Biol. 2005 Jun;3(3):717-42. doi: 10.1142/s0219720005001259.

SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法：一种用于判别式多类别蛋白质折叠和超家族识别的工具。

BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.

Prediction of protein folds: extraction of new features, dimensionality reduction, and fusion of heterogeneous classifiers.蛋白质折叠预测：新特征提取、降维及异构分类器融合

IEEE Trans Nanobioscience. 2009 Mar;8(1):100-10. doi: 10.1109/TNB.2009.2016488. Epub 2009 Mar 10.

Mining sequential patterns for protein fold recognition.挖掘用于蛋白质折叠识别的序列模式。

J Biomed Inform. 2008 Feb;41(1):165-79. doi: 10.1016/j.jbi.2007.05.004. Epub 2007 May 17.

Classification and knowledge discovery in protein databases.蛋白质数据库中的分类与知识发现。

J Biomed Inform. 2004 Aug;37(4):224-39. doi: 10.1016/j.jbi.2004.07.008.

Ensemble classifier for protein fold pattern recognition.用于蛋白质折叠模式识别的集成分类器。

Bioinformatics. 2006 Jul 15;22(14):1717-22. doi: 10.1093/bioinformatics/btl170. Epub 2006 May 3.

Evolutionary optimization of a hierarchical object recognition model.层次化目标识别模型的进化优化

IEEE Trans Syst Man Cybern B Cybern. 2005 Jun;35(3):426-37. doi: 10.1109/tsmcb.2005.846649.

引用本文的文献

Recognition of Protein Network for Bioinformatics Knowledge Analysis Using Support Vector Machine.基于支持向量机的生物信息学知识分析蛋白质网络识别

Biomed Res Int. 2022 Apr 23;2022:2273648. doi: 10.1155/2022/2273648. eCollection 2022.

Improving SDG Classification Precision Using Combinatorial Fusion.利用组合融合提高可持续发展目标分类精度。

Sensors (Basel). 2022 Jan 29;22(3):1067. doi: 10.3390/s22031067.

The diversity rank-score function for combining human visual perception systems.用于组合人类视觉感知系统的多样性排名分数函数。

Brain Inform. 2016 Mar;3(1):63-72. doi: 10.1007/s40708-016-0037-3. Epub 2016 Feb 15.

On the combination of two visual cognition systems using combinatorial fusion.关于使用组合融合的两种视觉认知系统的组合

Brain Inform. 2015 Mar;2(1):21-32. doi: 10.1007/s40708-015-0008-0. Epub 2015 Feb 3.

An empirical study of different approaches for protein classification.蛋白质分类不同方法的实证研究。

ScientificWorldJournal. 2014;2014:236717. doi: 10.1155/2014/236717. Epub 2014 Jun 15.

Intelligent screening systems for cervical cancer.宫颈癌智能筛查系统

ScientificWorldJournal. 2014;2014:810368. doi: 10.1155/2014/810368. Epub 2014 May 11.

Novel design strategy for checkpoint kinase 2 inhibitors using pharmacophore modeling, combinatorial fusion, and virtual screening.使用药效团建模、组合融合和虚拟筛选的检查点激酶2抑制剂的新型设计策略。

Biomed Res Int. 2014;2014:359494. doi: 10.1155/2014/359494. Epub 2014 Apr 23.

Combining multiple ChIP-seq peak detection systems using combinatorial fusion.组合使用 ChIP-seq 峰检测系统进行组合融合。

BMC Genomics. 2012;13 Suppl 8(Suppl 8):S12. doi: 10.1186/1471-2164-13-S8-S12. Epub 2012 Dec 17.

LigSeeSVM: ligand-based virtual screening using support vector machines and data fusion.LigSeeSVM：基于支持向量机和数据融合的基于配体的虚拟筛选

Int J Comput Biol Drug Des. 2011;4(3):274-89. doi: 10.1504/IJCBDD.2011.041415. Epub 2011 Jul 21.

Classification and clustering analysis of pyruvate dehydrogenase enzyme based on their physicochemical properties.基于丙酮酸脱氢酶理化性质的分类与聚类分析

Bioinformation. 2010 Apr 30;4(10):456-62. doi: 10.6026/97320630004456.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于提高蛋白质结构预测准确性的特征选择与组合标准。

Feature selection and combination criteria for improving accuracy in protein structure prediction.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献