Suppr超能文献

用于微阵列分类和基因选择的混合胡贝尔化支持向量机

Hybrid huberized support vector machines for microarray classification and gene selection.

作者信息

Wang Li, Zhu Ji, Zou Hui

机构信息

Ross School of Business, University of Michigan, Ann Arbor, MI 48109, USA.

出版信息

Bioinformatics. 2008 Feb 1;24(3):412-9. doi: 10.1093/bioinformatics/btm579. Epub 2008 Jan 5.

Abstract

MOTIVATION

The standard L(2)-norm support vector machine (SVM) is a widely used tool for microarray classification. Previous studies have demonstrated its superior performance in terms of classification accuracy. However, a major limitation of the SVM is that it cannot automatically select relevant genes for the classification. The L(1)-norm SVM is a variant of the standard L(2)-norm SVM, that constrains the L(1)-norm of the fitted coefficients. Due to the singularity of the L(1)-norm, the L(1)-norm SVM has the property of automatically selecting relevant genes. On the other hand, the L(1)-norm SVM has two drawbacks: (1) the number of selected genes is upper bounded by the size of the training data; (2) when there are several highly correlated genes, the L(1)-norm SVM tends to pick only a few of them, and remove the rest.

RESULTS

We propose a hybrid huberized support vector machine (HHSVM). The HHSVM combines the huberized hinge loss function and the elastic-net penalty. By doing so, the HHSVM performs automatic gene selection in a way similar to the L(1)-norm SVM. In addition, the HHSVM encourages highly correlated genes to be selected (or removed) together. We also develop an efficient algorithm to compute the entire solution path of the HHSVM. Numerical results indicate that the HHSVM tends to provide better variable selection results than the L(1)-norm SVM, especially when variables are highly correlated.

AVAILABILITY

R code are available at http://www.stat.lsa.umich.edu/~jizhu/code/hhsvm/.

摘要

动机

标准的L(2)范数支持向量机(SVM)是一种广泛用于微阵列分类的工具。先前的研究已经证明了其在分类准确性方面的卓越性能。然而,SVM的一个主要局限性在于它不能自动选择用于分类的相关基因。L(1)范数SVM是标准L(2)范数SVM的一种变体,它对拟合系数的L(1)范数进行约束。由于L(1)范数的奇异性,L(1)范数SVM具有自动选择相关基因的特性。另一方面,L(1)范数SVM有两个缺点:(1)所选基因的数量受训练数据大小的上限限制;(2)当存在几个高度相关的基因时,L(1)范数SVM倾向于只选择其中少数几个,而去除其余的。

结果

我们提出了一种混合的鲁棒化支持向量机(HHSVM)。HHSVM结合了鲁棒化的铰链损失函数和弹性网惩罚。通过这样做,HHSVM以类似于L(1)范数SVM的方式执行自动基因选择。此外,HHSVM鼓励一起选择(或去除)高度相关的基因。我们还开发了一种高效算法来计算HHSVM的整个解路径。数值结果表明,HHSVM倾向于比L(1)范数SVM提供更好的变量选择结果,特别是当变量高度相关时。

可用性

R代码可在http://www.stat.lsa.umich.edu/~jizhu/code/hhsvm/获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验