用于蛋白质折叠模式识别的集成分类器。

Ensemble classifier for protein fold pattern recognition.

作者信息

Shen Hong-Bin, Chou Kuo-Chen

机构信息

Institute of Image Processing and Pattern Recognition, Shanghai Jiaotong University, Shanghai 200030, China.

出版信息

Bioinformatics. 2006 Jul 15;22(14):1717-22. doi: 10.1093/bioinformatics/btl170. Epub 2006 May 3.

DOI:10.1093/bioinformatics/btl170

PMID:16672258

Abstract

MOTIVATION

Prediction of protein folding patterns is one level deeper than that of protein structural classes, and hence is much more complicated and difficult. To deal with such a challenging problem, the ensemble classifier was introduced. It was formed by a set of basic classifiers, with each trained in different parameter systems, such as predicted secondary structure, hydrophobicity, van der Waals volume, polarity, polarizability, as well as different dimensions of pseudo-amino acid composition, which were extracted from a training dataset. The operation engine for the constituent individual classifiers was OET-KNN (optimized evidence-theoretic k-nearest neighbors) rule. Their outcomes were combined through a weighted voting to give a final determination for classifying a query protein. The recognition was to find the true fold among the 27 possible patterns.

RESULTS

The overall success rate thus obtained was 62% for a testing dataset where most of the proteins have <25% sequence identity with the proteins used in training the classifier. Such a rate is 6-21% higher than the corresponding rates obtained by various existing NN (neural networks) and SVM (support vector machines) approaches, implying that the ensemble classifier is very promising and might become a useful vehicle in protein science, as well as proteomics and bioinformatics.

AVAILABILITY

The ensemble classifier, called PFP-Pred, is available as a web-server at http://202.120.37.186/bioinf/fold/PFP-Pred.htm for public usage.

摘要

动机

蛋白质折叠模式的预测比蛋白质结构类别的预测更深层次，因此更加复杂和困难。为了处理这一具有挑战性的问题，引入了集成分类器。它由一组基本分类器组成，每个基本分类器在不同的参数系统中进行训练，这些参数系统包括预测的二级结构、疏水性、范德华体积、极性、极化率，以及从训练数据集中提取的伪氨基酸组成的不同维度。组成各个分类器的操作引擎是OET-KNN（优化证据理论k近邻）规则。它们的结果通过加权投票进行组合，以对查询蛋白质进行分类的最终判定。识别是要在27种可能的模式中找到真实的折叠。

结果

对于一个测试数据集，由此获得的总体成功率为62%，在该数据集中，大多数蛋白质与用于训练分类器的蛋白质的序列同一性小于25%。这个比率比各种现有的神经网络（NN）和支持向量机（SVM）方法获得的相应比率高6%-21%，这意味着集成分类器非常有前景，可能会成为蛋白质科学以及蛋白质组学和生物信息学中的一种有用工具。

可用性

名为PFP-Pred的集成分类器可作为网络服务器在http://202.120.37.186/bioinf/fold/PFP-Pred.htm上供公众使用。

相似文献

Ensemble classifier for protein fold pattern recognition.

Bioinformatics. 2006 Jul 15;22(14):1717-22. doi: 10.1093/bioinformatics/btl170. Epub 2006 May 3.

Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization.

Biochem Biophys Res Commun. 2006 Aug 18;347(1):150-7. doi: 10.1016/j.bbrc.2006.06.059. Epub 2006 Jun 21.

A novel hierarchical ensemble classifier for protein fold recognition.

Protein Eng Des Sel. 2008 Nov;21(11):659-64. doi: 10.1093/protein/gzn045. Epub 2008 Sep 4.

Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs.

Bioinformatics. 2007 Dec 15;23(24):3320-7. doi: 10.1093/bioinformatics/btm527. Epub 2007 Nov 7.

A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM.

Comput Biol Chem. 2011 Feb;35(1):1-9. doi: 10.1016/j.compbiolchem.2010.12.001. Epub 2010 Dec 17.

Using ensemble classifier to identify membrane protein types.

Amino Acids. 2007;32(4):483-8. doi: 10.1007/s00726-006-0439-2. Epub 2006 Oct 12.

Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein types.

Biochem Biophys Res Commun. 2005 Aug 19;334(1):288-92. doi: 10.1016/j.bbrc.2005.06.087.

Support vector machines for prediction of dihedral angle regions.

Bioinformatics. 2006 Dec 15;22(24):3009-15. doi: 10.1093/bioinformatics/btl489. Epub 2006 Sep 27.

PFRES: protein fold classification by using evolutionary information and predicted secondary structure.

Bioinformatics. 2007 Nov 1;23(21):2843-50. doi: 10.1093/bioinformatics/btm475. Epub 2007 Oct 17.

Signal-3L: A 3-layer approach for predicting signal peptides.

Biochem Biophys Res Commun. 2007 Nov 16;363(2):297-303. doi: 10.1016/j.bbrc.2007.08.140. Epub 2007 Aug 31.

引用本文的文献

Genome-Wide Identification of Luffa Sucrose Synthase Genes Reveals -Mediated Sugar Metabolism Boosting Drought Tolerance.

Int J Mol Sci. 2025 Jun 13;26(12):5675. doi: 10.3390/ijms26125675.

Genome-Wide Analysis and Expression Profiles of AhCOLs Family in Peanut ( L.).

Int J Mol Sci. 2025 Apr 5;26(7):3404. doi: 10.3390/ijms26073404.

Genome-wide characterization of the MADS-box gene family in Paeonia ostii and expression analysis of genes related to floral organ development.

BMC Genomics. 2025 Jan 20;26(1):49. doi: 10.1186/s12864-024-11197-y.

Dual-Signal Feature Spaces Map Protein Subcellular Locations Based on Immunohistochemistry Image and Protein Sequence.

Sensors (Basel). 2023 Nov 7;23(22):9014. doi: 10.3390/s23229014.

Functional characterization of hypothetical proteins from Monkeypox virus.

J Genet Eng Biotechnol. 2023 Apr 26;21(1):46. doi: 10.1186/s43141-023-00505-w.

Machine learning for data integration in human gut microbiome.

Microb Cell Fact. 2022 Nov 23;21(1):241. doi: 10.1186/s12934-022-01973-4.

A Machine Learning-Based Approach Using Multi-omics Data to Predict Metabolic Pathways.

Methods Mol Biol. 2023;2553:441-452. doi: 10.1007/978-1-0716-2617-7_19.

Phylogenomics-Based Reconstruction and Molecular Evolutionary Histories of Brassica Photoreceptor Gene Families.

Int J Mol Sci. 2022 Aug 4;23(15):8695. doi: 10.3390/ijms23158695.

Artificial intelligence in the analysis of glycosylation data.

Biotechnol Adv. 2022 Nov;60:108008. doi: 10.1016/j.biotechadv.2022.108008. Epub 2022 Jun 20.

A Fusion-Based Technique With Hybrid Swarm Algorithm and Deep Learning for Biosignal Classification.

Front Hum Neurosci. 2022 Jun 3;16:895761. doi: 10.3389/fnhum.2022.895761. eCollection 2022.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于蛋白质折叠模式识别的集成分类器。

Ensemble classifier for protein fold pattern recognition.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献