使用多类支持向量机和特征子集选择进行酶分类。

Enzyme classification using multiclass support vector machine and feature subset selection.

作者信息

Pradhan Debasmita, Padhy Sudarsan, Sahoo Biswajit

机构信息

Department of Computer Scienceing and Engineering, Silicon Institute of Technology, Silicon Hills, Patia, Bhubaneswar, 751024, India.

出版信息

Comput Biol Chem. 2017 Oct;70:211-219. doi: 10.1016/j.compbiolchem.2017.08.009. Epub 2017 Aug 31.

DOI:10.1016/j.compbiolchem.2017.08.009

PMID:28934693

Abstract

Proteins are the macromolecules responsible for almost all biological processes in a cell. With the availability of large number of protein sequences from different sequencing projects, the challenge with the scientist is to characterize their functions. As the wet lab methods are time consuming and expensive, many computational methods such as FASTA, PSI-BLAST, DNA microarray clustering, and Nearest Neighborhood classification on protein-protein interaction network have been proposed. Support vector machine is one such method that has been used successfully for several problems such as protein fold recognition, protein structure prediction etc. Cai et al. in 2003 have used SVM for classifying proteins into different functional classes and to predict their function. They used the physico-chemical properties of proteins to represent the protein sequences. In this paper a model comprising of feature subset selection followed by multiclass Support Vector Machine is proposed to determine the functional class of a newly generated protein sequence. To train and test the model for its performance, 32 physico-chemical properties of enzymes from 6 enzyme classes are considered. To determine the features that contribute significantly for functional classification, Sequential Forward Floating Selection (SFFS), Orthogonal Forward Selection (OFS), and SVM Recursive Feature Elimination (SVM-RFE) algorithms are used and it is observed that out of 32 properties considered initially, only 20 features are sufficient to classify the proteins into its functional classes with an accuracy ranging from 91% to 94%. On comparison it is seen that, OFS followed by SVM performs better than other methods. Our model generalizes the existing model to include multiclass classification and to identify most significant features affecting the protein function.

摘要

蛋白质是负责细胞内几乎所有生物过程的大分子。随着来自不同测序项目的大量蛋白质序列的可得性，科学家面临的挑战是表征它们的功能。由于湿实验室方法既耗时又昂贵，因此已经提出了许多计算方法，如FASTA、PSI-BLAST、DNA微阵列聚类以及基于蛋白质-蛋白质相互作用网络的最近邻分类。支持向量机就是这样一种方法，它已成功用于解决诸如蛋白质折叠识别、蛋白质结构预测等多个问题。蔡等人在2003年使用支持向量机将蛋白质分类到不同的功能类别并预测其功能。他们使用蛋白质的物理化学性质来表示蛋白质序列。本文提出了一种由特征子集选择和多类支持向量机组成的模型，以确定新生成的蛋白质序列的功能类别。为了训练和测试该模型的性能，考虑了6种酶类中酶的32种物理化学性质。为了确定对功能分类有显著贡献的特征，使用了顺序向前浮动选择（SFFS）、正交向前选择（OFS）和支持向量机递归特征消除（SVM-RFE）算法，并且观察到，在最初考虑的32种性质中，只有20个特征足以将蛋白质分类到其功能类别中，准确率范围为91%至94%。通过比较可以看出，OFS后接支持向量机的方法比其他方法表现更好。我们的模型对现有模型进行了推广，以包括多类分类并识别影响蛋白质功能的最重要特征。

相似文献

Enzyme classification using multiclass support vector machine and feature subset selection.使用多类支持向量机和特征子集选择进行酶分类。

Comput Biol Chem. 2017 Oct;70:211-219. doi: 10.1016/j.compbiolchem.2017.08.009. Epub 2017 Aug 31.

An Efficient Feature Selection Strategy Based on Multiple Support Vector Machine Technology with Gene Expression Data.基于基因表达数据的多支持向量机技术的高效特征选择策略。

Biomed Res Int. 2018 Aug 30;2018:7538204. doi: 10.1155/2018/7538204. eCollection 2018.

MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data.MSVM-RFE：用于DNA微阵列数据多类基因选择的SVM-RFE扩展方法

Bioinformatics. 2007 May 1;23(9):1106-14. doi: 10.1093/bioinformatics/btm036.

Multiclass cancer classification by using fuzzy support vector machine and binary decision tree with gene selection.利用模糊支持向量机和带有基因选择的二叉决策树进行多类别癌症分类。

J Biomed Biotechnol. 2005 Jun 30;2005(2):160-71. doi: 10.1155/JBB.2005.160.

Computer-assisted lip diagnosis on Traditional Chinese Medicine using multi-class support vector machines.基于多类支持向量机的中医唇诊计算机辅助诊断。

BMC Complement Altern Med. 2012 Aug 16;12:127. doi: 10.1186/1472-6882-12-127.

SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier.基于支持向量机-递归特征消除的多类支持向量机分类器特征选择及田口参数优化

ScientificWorldJournal. 2014;2014:795624. doi: 10.1155/2014/795624. Epub 2014 Sep 10.

Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics.基于 SVM-RFE 和重叠率选择特征子集及其在生物信息学中的应用。

Molecules. 2017 Dec 26;23(1):52. doi: 10.3390/molecules23010052.

Ensemble Feature Learning of Genomic Data Using Support Vector Machine.使用支持向量机的基因组数据集成特征学习

PLoS One. 2016 Jun 15;11(6):e0157330. doi: 10.1371/journal.pone.0157330. eCollection 2016.

Operator functional state classification using least-square support vector machine based recursive feature elimination technique.基于最小二乘支持向量机的递归特征消除技术的操作员功能状态分类。

Comput Methods Programs Biomed. 2014;113(1):101-15. doi: 10.1016/j.cmpb.2013.09.007. Epub 2013 Sep 19.

The construction of support vector machine classifier using the firefly algorithm.基于萤火虫算法的支持向量机分类器构建。

Comput Intell Neurosci. 2015;2015:212719. doi: 10.1155/2015/212719. Epub 2015 Feb 23.

引用本文的文献

Personalized Body Constitution Inquiry Based on Machine Learning.基于机器学习的个体化体质辨识。

J Healthc Eng. 2020 Nov 12;2020:8834465. doi: 10.1155/2020/8834465. eCollection 2020.

Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate.从识别准确率和假发现率的角度评估蛋白质功能预测算法的性能。

Int J Mol Sci. 2018 Jan 8;19(1):183. doi: 10.3390/ijms19010183.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用多类支持向量机和特征子集选择进行酶分类。

Enzyme classification using multiclass support vector machine and feature subset selection.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献