改进的稀疏多类支持向量机及其在癌症分类基因选择中的应用

Improved Sparse Multi-Class SVM and Its Application for Gene Selection in Cancer Classification.

作者信息

Huang Lingkang, Zhang Hao Helen, Zeng Zhao-Bang, Bushel Pierre R

机构信息

GlaxoSmithKline, Research and Development, Division of Quantitative Sciences, Research Triangle Park, NC 27709, USA. ; Bioinformatics Research Center, North Carolina State University, Raleigh, NC 27695, USA. ; Biostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA.

出版信息

Cancer Inform. 2013 Aug 4;12:143-53. doi: 10.4137/CIN.S10212. eCollection 2013.

DOI:10.4137/CIN.S10212

PMID:23966761

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3740816/

Abstract

BACKGROUND

Microarray techniques provide promising tools for cancer diagnosis using gene expression profiles. However, molecular diagnosis based on high-throughput platforms presents great challenges due to the overwhelming number of variables versus the small sample size and the complex nature of multi-type tumors. Support vector machines (SVMs) have shown superior performance in cancer classification due to their ability to handle high dimensional low sample size data. The multi-class SVM algorithm of Crammer and Singer provides a natural framework for multi-class learning. Despite its effective performance, the procedure utilizes all variables without selection. In this paper, we propose to improve the procedure by imposing shrinkage penalties in learning to enforce solution sparsity.

RESULTS

The original multi-class SVM of Crammer and Singer is effective for multi-class classification but does not conduct variable selection. We improved the method by introducing soft-thresholding type penalties to incorporate variable selection into multi-class classification for high dimensional data. The new methods were applied to simulated data and two cancer gene expression data sets. The results demonstrate that the new methods can select a small number of genes for building accurate multi-class classification rules. Furthermore, the important genes selected by the methods overlap significantly, suggesting general agreement among different variable selection schemes.

CONCLUSIONS

High accuracy and sparsity make the new methods attractive for cancer diagnostics with gene expression data and defining targets of therapeutic intervention.

AVAILABILITY

The source MATLAB code are available from http://math.arizona.edu/~hzhang/software.html.

摘要

背景

微阵列技术为利用基因表达谱进行癌症诊断提供了有前景的工具。然而，基于高通量平台的分子诊断面临巨大挑战，因为变量数量众多而样本量小，且多类型肿瘤性质复杂。支持向量机（SVM）因其处理高维小样本量数据的能力，在癌症分类中表现出卓越性能。Crammer和Singer的多类SVM算法为多类学习提供了自然框架。尽管其性能有效，但该过程在学习时使用了所有变量而未进行选择。在本文中，我们提议通过在学习中施加收缩惩罚以强制解的稀疏性来改进该过程。

结果

Crammer和Singer最初的多类SVM对多类分类有效，但未进行变量选择。我们通过引入软阈值类型惩罚改进了该方法，将变量选择纳入高维数据的多类分类中。新方法应用于模拟数据和两个癌症基因表达数据集。结果表明，新方法能够选择少量基因来构建准确的多类分类规则。此外，这些方法选择的重要基因有显著重叠，表明不同变量选择方案之间总体一致。

结论

高精度和稀疏性使新方法对利用基因表达数据进行癌症诊断以及确定治疗干预靶点具有吸引力。

可用性

源MATLAB代码可从http://math.arizona.edu/~hzhang/software.html获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68af/3740816/a68c0f9a13d9/cin-12-2013-143f1.jpg

相似文献

Improved Sparse Multi-Class SVM and Its Application for Gene Selection in Cancer Classification.改进的稀疏多类支持向量机及其在癌症分类基因选择中的应用

Cancer Inform. 2013 Aug 4;12:143-53. doi: 10.4137/CIN.S10212. eCollection 2013.

Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data.弹性 SCAD 作为一种新的惩罚方法，用于高维数据中的 SVM 分类任务。

BMC Bioinformatics. 2011 May 9;12:138. doi: 10.1186/1471-2105-12-138.

SGL-SVM: A novel method for tumor classification via support vector machine with sparse group Lasso.SGL-SVM：一种通过带稀疏组套索的支持向量机进行肿瘤分类的新方法。

J Theor Biol. 2020 Feb 7;486:110098. doi: 10.1016/j.jtbi.2019.110098. Epub 2019 Nov 28.

Computer-assisted lip diagnosis on Traditional Chinese Medicine using multi-class support vector machines.基于多类支持向量机的中医唇诊计算机辅助诊断。

BMC Complement Altern Med. 2012 Aug 16;12:127. doi: 10.1186/1472-6882-12-127.

A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis.用于微阵列基因表达癌症诊断的多类别分类方法的综合评估。

Bioinformatics. 2005 Mar 1;21(5):631-43. doi: 10.1093/bioinformatics/bti033. Epub 2004 Sep 16.

SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法：一种用于判别式多类别蛋白质折叠和超家族识别的工具。

BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.

Feature selection and tumor classification for microarray data using relaxed Lasso and generalized multi-class support vector machine.使用松弛 Lasso 和广义多类支持向量机进行微阵列数据分析的特征选择和肿瘤分类。

J Theor Biol. 2019 Feb 21;463:77-91. doi: 10.1016/j.jtbi.2018.12.010. Epub 2018 Dec 8.

Enhancing the prediction of IDC breast cancer staging from gene expression profiles using hybrid feature selection methods and deep learning architecture.使用混合特征选择方法和深度学习架构增强从基因表达谱预测浸润性导管癌乳腺癌分期的能力。

Med Biol Eng Comput. 2023 Nov;61(11):2895-2919. doi: 10.1007/s11517-023-02892-1. Epub 2023 Aug 2.

Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery.多分辨率独立成分分析在高性能肿瘤分类和生物标志物发现中的应用。

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S7. doi: 10.1186/1471-2105-12-S1-S7.

Recursive cluster elimination (RCE) for classification and feature selection from gene expression data.用于从基因表达数据中进行分类和特征选择的递归聚类消除法（RCE）

BMC Bioinformatics. 2007 May 2;8:144. doi: 10.1186/1471-2105-8-144.

引用本文的文献

CPEM: Accurate cancer type classification based on somatic alterations using an ensemble of a random forest and a deep neural network.CPEM：基于随机森林和深度神经网络集成的体细胞改变的准确癌症类型分类。

Sci Rep. 2019 Nov 15;9(1):16927. doi: 10.1038/s41598-019-53034-3.

HIV-associated sensory polyneuropathy and neuronal injury are associated with miRNA-455-3p induction.HIV 相关感觉性多神经病和神经元损伤与 miRNA-455-3p 的诱导有关。

JCI Insight. 2018 Dec 6;3(23):122450. doi: 10.1172/jci.insight.122450.

Verification of Three-Phase Dependency Analysis Bayesian Network Learning Method for Maize Carotenoid Gene Mining.用于玉米类胡萝卜素基因挖掘的三相依赖分析贝叶斯网络学习方法的验证

Biomed Res Int. 2017;2017:1813494. doi: 10.1155/2017/1813494. Epub 2017 Jul 30.

Machine learning methods in the computational biology of cancer.癌症计算生物学中的机器学习方法。

Proc Math Phys Eng Sci. 2014 Jul 8;470(2167):20140081. doi: 10.1098/rspa.2014.0081.

本文引用的文献

Mining gene expression profiles: an integrated implementation of kernel principal component analysis and singular value decomposition.挖掘基因表达谱：核主成分分析和奇异值分解的集成实现。

Genomics Proteomics Bioinformatics. 2010 Sep;8(3):200-10. doi: 10.1016/S1672-0229(10)60022-8.

An integrated method for cancer classification and rule extraction from microarray data.一种从微阵列数据中进行癌症分类和规则提取的综合方法。

J Biomed Sci. 2009 Feb 24;16(1):25. doi: 10.1186/1423-0127-16-25.

Structured polychotomous machine diagnosis of multiple cancer types using gene expression.使用基因表达对多种癌症类型进行结构化多分类机器诊断。

Bioinformatics. 2006 Apr 15;22(8):950-8. doi: 10.1093/bioinformatics/btl029. Epub 2006 Feb 1.

Gene selection using support vector machines with non-convex penalty.使用具有非凸惩罚项的支持向量机进行基因选择。

Bioinformatics. 2006 Jan 1;22(1):88-95. doi: 10.1093/bioinformatics/bti736. Epub 2005 Oct 25.

A gene selection algorithm based on the gene regulation probability using maximal likelihood estimation.一种基于使用最大似然估计的基因调控概率的基因选择算法。

Biotechnol Lett. 2005 Apr;27(8):597-603. doi: 10.1007/s10529-005-3253-0.

Joint classifier and feature optimization for comprehensive cancer diagnosis using gene expression data.使用基因表达数据进行癌症综合诊断的联合分类器与特征优化

J Comput Biol. 2004;11(2-3):227-42. doi: 10.1089/1066527041410463.

Differential expression of TCL1 during pre-B-cell acute lymphoblastic leukemia progression.TCL1在B前体急性淋巴细胞白血病进展过程中的差异表达。

Cancer Genet Cytogenet. 2002 Jun;135(2):110-9. doi: 10.1016/s0165-4608(01)00655-0.

Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks.利用基因表达谱和人工神经网络进行癌症的分类与诊断预测。

Nat Med. 2001 Jun;7(6):673-9. doi: 10.1038/89044.

Identification of genes associated with the progression of adult T cell leukemia (ATL).与成人T细胞白血病（ATL）进展相关基因的鉴定。

Jpn J Cancer Res. 2000 Nov;91(11):1103-10. doi: 10.1111/j.1349-7006.2000.tb00892.x.

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.癌症的分子分类：通过基因表达监测进行类别发现和类别预测。

Science. 1999 Oct 15;286(5439):531-7. doi: 10.1126/science.286.5439.531.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

改进的稀疏多类支持向量机及其在癌症分类基因选择中的应用

Improved Sparse Multi-Class SVM and Its Application for Gene Selection in Cancer Classification.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

AVAILABILITY

背景

结果

结论

可用性

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献