Suppr超能文献

用于蛋白质折叠模式识别的集成分类器。

Ensemble classifier for protein fold pattern recognition.

作者信息

Shen Hong-Bin, Chou Kuo-Chen

机构信息

Institute of Image Processing and Pattern Recognition, Shanghai Jiaotong University, Shanghai 200030, China.

出版信息

Bioinformatics. 2006 Jul 15;22(14):1717-22. doi: 10.1093/bioinformatics/btl170. Epub 2006 May 3.

Abstract

MOTIVATION

Prediction of protein folding patterns is one level deeper than that of protein structural classes, and hence is much more complicated and difficult. To deal with such a challenging problem, the ensemble classifier was introduced. It was formed by a set of basic classifiers, with each trained in different parameter systems, such as predicted secondary structure, hydrophobicity, van der Waals volume, polarity, polarizability, as well as different dimensions of pseudo-amino acid composition, which were extracted from a training dataset. The operation engine for the constituent individual classifiers was OET-KNN (optimized evidence-theoretic k-nearest neighbors) rule. Their outcomes were combined through a weighted voting to give a final determination for classifying a query protein. The recognition was to find the true fold among the 27 possible patterns.

RESULTS

The overall success rate thus obtained was 62% for a testing dataset where most of the proteins have <25% sequence identity with the proteins used in training the classifier. Such a rate is 6-21% higher than the corresponding rates obtained by various existing NN (neural networks) and SVM (support vector machines) approaches, implying that the ensemble classifier is very promising and might become a useful vehicle in protein science, as well as proteomics and bioinformatics.

AVAILABILITY

The ensemble classifier, called PFP-Pred, is available as a web-server at http://202.120.37.186/bioinf/fold/PFP-Pred.htm for public usage.

摘要

动机

蛋白质折叠模式的预测比蛋白质结构类别的预测更深层次,因此更加复杂和困难。为了处理这一具有挑战性的问题,引入了集成分类器。它由一组基本分类器组成,每个基本分类器在不同的参数系统中进行训练,这些参数系统包括预测的二级结构、疏水性、范德华体积、极性、极化率,以及从训练数据集中提取的伪氨基酸组成的不同维度。组成各个分类器的操作引擎是OET-KNN(优化证据理论k近邻)规则。它们的结果通过加权投票进行组合,以对查询蛋白质进行分类的最终判定。识别是要在27种可能的模式中找到真实的折叠。

结果

对于一个测试数据集,由此获得的总体成功率为62%,在该数据集中,大多数蛋白质与用于训练分类器的蛋白质的序列同一性小于25%。这个比率比各种现有的神经网络(NN)和支持向量机(SVM)方法获得的相应比率高6%-21%,这意味着集成分类器非常有前景,可能会成为蛋白质科学以及蛋白质组学和生物信息学中的一种有用工具。

可用性

名为PFP-Pred的集成分类器可作为网络服务器在http://202.120.37.186/bioinf/fold/PFP-Pred.htm上供公众使用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验