Suppr超能文献

改进的稀疏多类支持向量机及其在癌症分类基因选择中的应用

Improved Sparse Multi-Class SVM and Its Application for Gene Selection in Cancer Classification.

作者信息

Huang Lingkang, Zhang Hao Helen, Zeng Zhao-Bang, Bushel Pierre R

机构信息

GlaxoSmithKline, Research and Development, Division of Quantitative Sciences, Research Triangle Park, NC 27709, USA. ; Bioinformatics Research Center, North Carolina State University, Raleigh, NC 27695, USA. ; Biostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA.

出版信息

Cancer Inform. 2013 Aug 4;12:143-53. doi: 10.4137/CIN.S10212. eCollection 2013.

Abstract

BACKGROUND

Microarray techniques provide promising tools for cancer diagnosis using gene expression profiles. However, molecular diagnosis based on high-throughput platforms presents great challenges due to the overwhelming number of variables versus the small sample size and the complex nature of multi-type tumors. Support vector machines (SVMs) have shown superior performance in cancer classification due to their ability to handle high dimensional low sample size data. The multi-class SVM algorithm of Crammer and Singer provides a natural framework for multi-class learning. Despite its effective performance, the procedure utilizes all variables without selection. In this paper, we propose to improve the procedure by imposing shrinkage penalties in learning to enforce solution sparsity.

RESULTS

The original multi-class SVM of Crammer and Singer is effective for multi-class classification but does not conduct variable selection. We improved the method by introducing soft-thresholding type penalties to incorporate variable selection into multi-class classification for high dimensional data. The new methods were applied to simulated data and two cancer gene expression data sets. The results demonstrate that the new methods can select a small number of genes for building accurate multi-class classification rules. Furthermore, the important genes selected by the methods overlap significantly, suggesting general agreement among different variable selection schemes.

CONCLUSIONS

High accuracy and sparsity make the new methods attractive for cancer diagnostics with gene expression data and defining targets of therapeutic intervention.

AVAILABILITY

The source MATLAB code are available from http://math.arizona.edu/~hzhang/software.html.

摘要

背景

微阵列技术为利用基因表达谱进行癌症诊断提供了有前景的工具。然而,基于高通量平台的分子诊断面临巨大挑战,因为变量数量众多而样本量小,且多类型肿瘤性质复杂。支持向量机(SVM)因其处理高维小样本量数据的能力,在癌症分类中表现出卓越性能。Crammer和Singer的多类SVM算法为多类学习提供了自然框架。尽管其性能有效,但该过程在学习时使用了所有变量而未进行选择。在本文中,我们提议通过在学习中施加收缩惩罚以强制解的稀疏性来改进该过程。

结果

Crammer和Singer最初的多类SVM对多类分类有效,但未进行变量选择。我们通过引入软阈值类型惩罚改进了该方法,将变量选择纳入高维数据的多类分类中。新方法应用于模拟数据和两个癌症基因表达数据集。结果表明,新方法能够选择少量基因来构建准确的多类分类规则。此外,这些方法选择的重要基因有显著重叠,表明不同变量选择方案之间总体一致。

结论

高精度和稀疏性使新方法对利用基因表达数据进行癌症诊断以及确定治疗干预靶点具有吸引力。

可用性

源MATLAB代码可从http://math.arizona.edu/~hzhang/software.html获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68af/3740816/a68c0f9a13d9/cin-12-2013-143f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验