用于微阵列分类和基因选择的混合胡贝尔化支持向量机

Hybrid huberized support vector machines for microarray classification and gene selection.

作者信息

Wang Li, Zhu Ji, Zou Hui

机构信息

Ross School of Business, University of Michigan, Ann Arbor, MI 48109, USA.

出版信息

Bioinformatics. 2008 Feb 1;24(3):412-9. doi: 10.1093/bioinformatics/btm579. Epub 2008 Jan 5.

DOI:10.1093/bioinformatics/btm579

PMID:18175770

Abstract

MOTIVATION

The standard L(2)-norm support vector machine (SVM) is a widely used tool for microarray classification. Previous studies have demonstrated its superior performance in terms of classification accuracy. However, a major limitation of the SVM is that it cannot automatically select relevant genes for the classification. The L(1)-norm SVM is a variant of the standard L(2)-norm SVM, that constrains the L(1)-norm of the fitted coefficients. Due to the singularity of the L(1)-norm, the L(1)-norm SVM has the property of automatically selecting relevant genes. On the other hand, the L(1)-norm SVM has two drawbacks: (1) the number of selected genes is upper bounded by the size of the training data; (2) when there are several highly correlated genes, the L(1)-norm SVM tends to pick only a few of them, and remove the rest.

RESULTS

We propose a hybrid huberized support vector machine (HHSVM). The HHSVM combines the huberized hinge loss function and the elastic-net penalty. By doing so, the HHSVM performs automatic gene selection in a way similar to the L(1)-norm SVM. In addition, the HHSVM encourages highly correlated genes to be selected (or removed) together. We also develop an efficient algorithm to compute the entire solution path of the HHSVM. Numerical results indicate that the HHSVM tends to provide better variable selection results than the L(1)-norm SVM, especially when variables are highly correlated.

AVAILABILITY

R code are available at http://www.stat.lsa.umich.edu/~jizhu/code/hhsvm/.

摘要

动机

标准的L(2)范数支持向量机（SVM）是一种广泛用于微阵列分类的工具。先前的研究已经证明了其在分类准确性方面的卓越性能。然而，SVM的一个主要局限性在于它不能自动选择用于分类的相关基因。L(1)范数SVM是标准L(2)范数SVM的一种变体，它对拟合系数的L(1)范数进行约束。由于L(1)范数的奇异性，L(1)范数SVM具有自动选择相关基因的特性。另一方面，L(1)范数SVM有两个缺点：（1）所选基因的数量受训练数据大小的上限限制；（2）当存在几个高度相关的基因时，L(1)范数SVM倾向于只选择其中少数几个，而去除其余的。

结果

我们提出了一种混合的鲁棒化支持向量机（HHSVM）。HHSVM结合了鲁棒化的铰链损失函数和弹性网惩罚。通过这样做，HHSVM以类似于L(1)范数SVM的方式执行自动基因选择。此外，HHSVM鼓励一起选择（或去除）高度相关的基因。我们还开发了一种高效算法来计算HHSVM的整个解路径。数值结果表明，HHSVM倾向于比L(1)范数SVM提供更好的变量选择结果，特别是当变量高度相关时。

可用性

R代码可在http://www.stat.lsa.umich.edu/~jizhu/code/hhsvm/获取。

相似文献

Hybrid huberized support vector machines for microarray classification and gene selection.用于微阵列分类和基因选择的混合胡贝尔化支持向量机

Bioinformatics. 2008 Feb 1;24(3):412-9. doi: 10.1093/bioinformatics/btm579. Epub 2008 Jan 5.

Recursive gene selection based on maximum margin criterion: a comparison with SVM-RFE.基于最大间隔准则的递归基因选择：与支持向量机递归特征消除法的比较

BMC Bioinformatics. 2006 Dec 25;7:543. doi: 10.1186/1471-2105-7-543.

Improved centroids estimation for the nearest shrunken centroid classifier.改进最近收缩质心分类器的质心估计

Bioinformatics. 2007 Apr 15;23(8):972-9. doi: 10.1093/bioinformatics/btm046. Epub 2007 Mar 24.

An integrated algorithm for gene selection and classification applied to microarray data of ovarian cancer.一种应用于卵巢癌微阵列数据的基因选择与分类综合算法。

Artif Intell Med. 2008 Jan;42(1):81-93. doi: 10.1016/j.artmed.2007.09.004. Epub 2007 Nov 19.

Tumor classification ranking from microarray data.基于微阵列数据的肿瘤分类排名

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S21. doi: 10.1186/1471-2164-9-S2-S21.

Gene selection using support vector machines with non-convex penalty.使用具有非凸惩罚项的支持向量机进行基因选择。

Bioinformatics. 2006 Jan 1;22(1):88-95. doi: 10.1093/bioinformatics/bti736. Epub 2005 Oct 25.

MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data.MSVM-RFE：用于DNA微阵列数据多类基因选择的SVM-RFE扩展方法

Bioinformatics. 2007 May 1;23(9):1106-14. doi: 10.1093/bioinformatics/btm036.

A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue.一种用于从癌组织基因表达数据中进行特征选择和规则提取的多核支持向量机方案。

Artif Intell Med. 2007 Oct;41(2):161-75. doi: 10.1016/j.artmed.2007.07.008. Epub 2007 Sep 11.

Structured polychotomous machine diagnosis of multiple cancer types using gene expression.使用基因表达对多种癌症类型进行结构化多分类机器诊断。

Bioinformatics. 2006 Apr 15;22(8):950-8. doi: 10.1093/bioinformatics/btl029. Epub 2006 Feb 1.

Bayesian variable selection for the analysis of microarray data with censored outcomes.用于分析具有删失结局的微阵列数据的贝叶斯变量选择

Bioinformatics. 2006 Sep 15;22(18):2262-8. doi: 10.1093/bioinformatics/btl362. Epub 2006 Jul 15.

引用本文的文献

Multiway sparse distance weighted discrimination.多路稀疏距离加权判别

J Comput Graph Stat. 2023;32(2):730-743. doi: 10.1080/10618600.2022.2099404. Epub 2022 Aug 30.

Structured sparse support vector machine with ordered features.具有有序特征的结构化稀疏支持向量机

J Appl Stat. 2020 Nov 18;49(5):1105-1120. doi: 10.1080/02664763.2020.1849053. eCollection 2022.

Signature for Pain Recovery IN Teens (SPRINT): protocol for a multisite prospective signature study in chronic musculoskeletal pain.青少年疼痛恢复特征（SPRINT）研究方案：一项多中心慢性肌肉骨骼疼痛前瞻性特征研究方案。

BMJ Open. 2022 Jun 8;12(6):e061548. doi: 10.1136/bmjopen-2022-061548.

Brain imaging-based machine learning in autism spectrum disorder: methods and applications.基于脑影像的自闭症谱系障碍机器学习：方法与应用。

J Neurosci Methods. 2021 Sep 1;361:109271. doi: 10.1016/j.jneumeth.2021.109271. Epub 2021 Jun 24.

A Pipeline for Integrated Theory and Data-Driven Modeling of Biomedical Data.生物医学数据的理论与数据驱动建模的集成流水线。

IEEE/ACM Trans Comput Biol Bioinform. 2021 May-Jun;18(3):811-822. doi: 10.1109/TCBB.2020.3019237. Epub 2021 Jun 3.

Network modeling of patients' biomolecular profiles for clinical phenotype/outcome prediction.对患者生物分子谱进行网络建模以进行临床表型/结果预测。

Sci Rep. 2020 Feb 27;10(1):3612. doi: 10.1038/s41598-020-60235-8.

Identification of Serum MicroRNAs as Novel Biomarkers in Esophageal Squamous Cell Carcinoma Using Feature Selection Algorithms.使用特征选择算法鉴定血清微小RNA作为食管鳞状细胞癌的新型生物标志物

Front Oncol. 2019 Jan 21;8:674. doi: 10.3389/fonc.2018.00674. eCollection 2018.

Feature selection by optimizing a lower bound of conditional mutual information.通过优化条件互信息的下限进行特征选择。

Inf Sci (N Y). 2017 Dec;418-419:652-667. doi: 10.1016/j.ins.2017.08.036. Epub 2017 Aug 9.

Sparse Bayesian classification and feature selection for biological expression data with high correlations.用于具有高度相关性的生物表达数据的稀疏贝叶斯分类与特征选择

PLoS One. 2017 Dec 27;12(12):e0189541. doi: 10.1371/journal.pone.0189541. eCollection 2017.

Residual Weighted Learning for Estimating Individualized Treatment Rules.用于估计个体化治疗规则的残差加权学习

J Am Stat Assoc. 2017;112(517):169-187. doi: 10.1080/01621459.2015.1093947. Epub 2017 May 3.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于微阵列分类和基因选择的混合胡贝尔化支持向量机

Hybrid huberized support vector machines for microarray classification and gene selection.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献