Zhou Xin, Tuck David P
Department of Pathology, Yale University School of Medicine, New Haven, Connecticut 06510, USA.
Bioinformatics. 2007 May 1;23(9):1106-14. doi: 10.1093/bioinformatics/btm036.
Given the thousands of genes and the small number of samples, gene selection has emerged as an important research problem in microarray data analysis. Support Vector Machine-Recursive Feature Elimination (SVM-RFE) is one of a group of recently described algorithms which represent the stat-of-the-art for gene selection. Just like SVM itself, SVM-RFE was originally designed to solve binary gene selection problems. Several groups have extended SVM-RFE to solve multiclass problems using one-versus-all techniques. However, the genes selected from one binary gene selection problem may reduce the classification performance in other binary problems.
In the present study, we propose a family of four extensions to SVM-RFE (called MSVM-RFE) to solve the multiclass gene selection problem, based on different frameworks of multiclass SVMs. By simultaneously considering all classes during the gene selection stages, our proposed extensions identify genes leading to more accurate classification.
鉴于基因数量众多而样本数量较少,基因选择已成为微阵列数据分析中的一个重要研究问题。支持向量机递归特征消除法(SVM-RFE)是最近描述的一组算法之一,代表了基因选择的最新技术水平。与支持向量机本身一样,SVM-RFE最初是为解决二元基因选择问题而设计的。有几个研究小组已经扩展了SVM-RFE,使用一对多技术来解决多类问题。然而,从一个二元基因选择问题中选择的基因可能会降低其他二元问题中的分类性能。
在本研究中,我们基于多类支持向量机的不同框架,提出了SVM-RFE的四个扩展方法(称为MSVM-RFE)来解决多类基因选择问题。通过在基因选择阶段同时考虑所有类别,我们提出的扩展方法能够识别出能带来更准确分类的基因。