Huang Hui-Ling, Chang Fang-Lin
Department of Information Management, Jin Wen Institute of Technology, and Department of Anesthesiology, Tri-Service General Hospital, Taipei, Taiwan.
Biosystems. 2007 Sep-Oct;90(2):516-28. doi: 10.1016/j.biosystems.2006.12.003. Epub 2006 Dec 16.
An optimal design of support vector machine (SVM)-based classifiers for prediction aims to optimize the combination of feature selection, parameter setting of SVM, and cross-validation methods. However, SVMs do not offer the mechanism of automatic internal relevant feature detection. The appropriate setting of their control parameters is often treated as another independent problem. This paper proposes an evolutionary approach to designing an SVM-based classifier (named ESVM) by simultaneous optimization of automatic feature selection and parameter tuning using an intelligent genetic algorithm, combined with k-fold cross-validation regarded as an estimator of generalization ability. To illustrate and evaluate the efficiency of ESVM, a typical application to microarray classification using 11 multi-class datasets is adopted. By considering model uncertainty, a frequency-based technique by voting on multiple sets of potentially informative features is used to identify the most effective subset of genes. It is shown that ESVM can obtain a high accuracy of 96.88% with a small number 10.0 of selected genes using 10-fold cross-validation for the 11 datasets averagely. The merits of ESVM are three-fold: (1) automatic feature selection and parameter setting embedded into ESVM can advance prediction abilities, compared to traditional SVMs; (2) ESVM can serve not only as an accurate classifier but also as an adaptive feature extractor; (3) ESVM is developed as an efficient tool so that various SVMs can be used conveniently as the core of ESVM for bioinformatics problems.
基于支持向量机(SVM)的预测分类器的优化设计旨在优化特征选择、SVM参数设置和交叉验证方法的组合。然而,支持向量机不具备自动进行内部相关特征检测的机制。其控制参数的适当设置通常被视为另一个独立的问题。本文提出了一种进化方法来设计基于支持向量机的分类器(名为ESVM),通过使用智能遗传算法同时优化自动特征选择和参数调整,并结合k折交叉验证作为泛化能力的估计器。为了说明和评估ESVM的效率,采用了一个使用11个多类数据集进行微阵列分类的典型应用。通过考虑模型的不确定性,使用一种基于频率的技术,对多组潜在信息特征进行投票,以识别最有效的基因子集。结果表明,对于这11个数据集,平均使用10折交叉验证时,ESVM使用仅10.0个选定基因就能获得96.88%的高精度。ESVM的优点有三个方面:(1)与传统支持向量机相比,ESVM中嵌入的自动特征选择和参数设置可以提高预测能力;(2)ESVM不仅可以作为一个准确的分类器,还可以作为一个自适应特征提取器;(3)ESVM被开发为一个高效的工具,因此各种支持向量机可以方便地用作ESVM的核心来解决生物信息学问题。