使用基于模糊规则的分类器进行蛋白质超家族分类。

Protein superfamily classification using fuzzy rule-based classifier.

作者信息

Mansoori Eghbal G, Zolghadri Mansoor J, Katebi Seraj D

机构信息

Department of Computer Science and Engineering, School of Engineering, Shiraz University, Shiraz, Iran.

出版信息

IEEE Trans Nanobioscience. 2009 Mar;8(1):92-9. doi: 10.1109/TNB.2009.2016484. Epub 2009 Mar 21.

DOI:10.1109/TNB.2009.2016484

PMID:19307166

Abstract

In this paper, we have proposed a fuzzy rule-based classifier for assigning amino acid sequences into different superfamilies of proteins. While the most popular methods for protein classification rely on sequence alignment, our approach is alignment-free and so more human readable. It accounts for the distribution of contiguous patterns of n amino acids ( n-grams) in the sequences as features, alike other alignment-independent methods. Our approach, first extracts a plenty of features from a set of training sequences, then selects only some best of them, using a proposed feature ranking method. Thereafter, using these features, a novel steady-state genetic algorithm for extracting fuzzy classification rules from data is used to generate a compact set of interpretable fuzzy rules. The generated rules are simple and human understandable. So, the biologists can utilize them, for classification purposes, or incorporate their expertise to interpret or even modify them. To evaluate the performance of our fuzzy rule-based classifier, we have compared it with the conventional nonfuzzy C4.5 algorithm, beside some other fuzzy classifiers. This comparative study is conducted through classifying the protein sequences of five superfamily classes, downloaded from a public domain database. The obtained results show that the generated fuzzy rules are more interpretable, with acceptable improvement in the classification accuracy.

摘要

在本文中，我们提出了一种基于模糊规则的分类器，用于将氨基酸序列分配到不同的蛋白质超家族中。虽然最流行的蛋白质分类方法依赖于序列比对，但我们的方法无需比对，因此更易于理解。与其他不依赖比对的方法一样，它将序列中n个氨基酸的连续模式（n元组）的分布作为特征。我们的方法首先从一组训练序列中提取大量特征，然后使用一种提出的特征排序方法仅选择其中一些最佳特征。此后，利用这些特征，一种用于从数据中提取模糊分类规则的新型稳态遗传算法被用于生成一组紧凑的可解释模糊规则。生成的规则简单且易于理解。因此，生物学家可以将它们用于分类目的，或者结合他们的专业知识来解释甚至修改它们。为了评估我们基于模糊规则的分类器的性能，除了一些其他模糊分类器外，我们还将其与传统的非模糊C4.5算法进行了比较。这项比较研究是通过对从公共领域数据库下载的五个超家族类别的蛋白质序列进行分类来进行的。所得结果表明，生成的模糊规则更易于解释，并且在分类准确率上有可接受的提高。

相似文献

Protein superfamily classification using fuzzy rule-based classifier.使用基于模糊规则的分类器进行蛋白质超家族分类。

IEEE Trans Nanobioscience. 2009 Mar;8(1):92-9. doi: 10.1109/TNB.2009.2016484. Epub 2009 Mar 21.

Prediction of protein structural class for the twilight zone sequences.对处于模糊界限区域的序列进行蛋白质结构类别的预测。

Biochem Biophys Res Commun. 2007 Jun 1;357(2):453-60. doi: 10.1016/j.bbrc.2007.03.164. Epub 2007 Apr 5.

Using supervised fuzzy clustering to predict protein structural classes.使用监督模糊聚类预测蛋白质结构类别。

Biochem Biophys Res Commun. 2005 Aug 26;334(2):577-81. doi: 10.1016/j.bbrc.2005.06.128.

SVM-HUSTLE--an iterative semi-supervised machine learning approach for pairwise protein remote homology detection.SVM-HUSTLE——一种用于成对蛋白质远程同源性检测的迭代半监督机器学习方法。

Bioinformatics. 2008 Mar 15;24(6):783-90. doi: 10.1093/bioinformatics/btn028. Epub 2008 Feb 1.

Mining sequential patterns for protein fold recognition.挖掘用于蛋白质折叠识别的序列模式。

J Biomed Inform. 2008 Feb;41(1):165-79. doi: 10.1016/j.jbi.2007.05.004. Epub 2007 May 17.

A novel and efficient technique for identification and classification of GPCRs.一种用于G蛋白偶联受体识别与分类的新颖且高效的技术。

IEEE Trans Inf Technol Biomed. 2008 Jul;12(4):541-8. doi: 10.1109/TITB.2007.911308.

Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization.Hum-PLoc：一种用于预测人类蛋白质亚细胞定位的新型集成分类器。

Biochem Biophys Res Commun. 2006 Aug 18;347(1):150-7. doi: 10.1016/j.bbrc.2006.06.059. Epub 2006 Jun 21.

Statistical geometry based prediction of nonsynonymous SNP functional effects using random forest and neuro-fuzzy classifiers.基于统计几何学，使用随机森林和神经模糊分类器预测非同义单核苷酸多态性的功能效应

Proteins. 2008 Jun;71(4):1930-9. doi: 10.1002/prot.21838.

Improving fuzzy rule classifier by extracting suitable features from capacities with respect to the choquet integral.通过从与Choquet积分相关的容量中提取合适特征来改进模糊规则分类器。

IEEE Trans Syst Man Cybern B Cybern. 2008 Oct;38(5):1195-206. doi: 10.1109/TSMCB.2008.925750.

Small, fuzzy and interpretable gene expression based classifiers.基于小的、模糊且可解释的基因表达的分类器。

Bioinformatics. 2005 May 1;21(9):1964-70. doi: 10.1093/bioinformatics/bti287. Epub 2005 Jan 20.

引用本文的文献

Sequence-Based Prediction of Plant Allergenic Proteins: Machine Learning Classification Approach.基于序列的植物变应原蛋白预测：机器学习分类方法

ACS Omega. 2023 Jan 20;8(4):3698-3704. doi: 10.1021/acsomega.2c02842. eCollection 2023 Jan 31.

A Treatise to Computational Approaches Towards Prediction of Membrane Protein and Its Subtypes.关于膜蛋白及其亚型预测的计算方法的论文

J Membr Biol. 2017 Feb;250(1):55-76. doi: 10.1007/s00232-016-9937-7. Epub 2016 Nov 19.

Efficient feature selection and classification of protein sequence data in bioinformatics.生物信息学中蛋白质序列数据的高效特征选择与分类

ScientificWorldJournal. 2014;2014:173869. doi: 10.1155/2014/173869. Epub 2014 Jun 19.

Sequence and structure based models of HIV-1 protease and reverse transcriptase drug resistance.基于序列和结构的 HIV-1 蛋白酶和逆转录酶耐药性模型。

BMC Genomics. 2013;14 Suppl 4(Suppl 4):S3. doi: 10.1186/1471-2164-14-S4-S3. Epub 2013 Oct 1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用基于模糊规则的分类器进行蛋白质超家族分类。

Protein superfamily classification using fuzzy rule-based classifier.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献