特征导入向量机：一种具有灵活特征选择的通用分类器。

Feature Import Vector Machine: A General Classifier with Flexible Feature Selection.

作者信息

Ghosh Samiran, Wang Yazhen

机构信息

Department of Family Medicine & Public Health Sciences, Wayne State University; Center of Molecular Medicine and Genetics, Wayne State University.

Department of Statistics, University of Wisconsin, Madison.

出版信息

Stat Anal Data Min. 2015 Feb;8(1):49-63. doi: 10.1002/sam.11259. Epub 2015 Jan 26.

DOI:10.1002/sam.11259

PMID:27081431

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4829386/

Abstract

The support vector machine (SVM) and other reproducing kernel Hilbert space (RKHS) based classifier systems are drawing much attention recently due to its robustness and generalization capability. General theme here is to construct classifiers based on the training data in a high dimensional space by using all available dimensions. The SVM achieves huge data compression by selecting only few observations which lie close to the boundary of the classifier function. However when the number of observations are not very large (small ) but the number of dimensions/features are large (large ), then it is not necessary that all available features are of equal importance in the classification context. Possible selection of an useful fraction of the available features may result in huge data compression. In this paper we propose an algorithmic approach by means of which such an set of features could be selected. In short, we reverse the traditional sequential observation selection strategy of SVM to that of sequential feature selection. To achieve this we have modified the solution proposed by Zhu and Hastie (2005) in the context of import vector machine (IVM), to select an sub-dimensional model to build the final classifier with sufficient accuracy.

摘要

支持向量机（SVM）和其他基于再生核希尔伯特空间（RKHS）的分类系统近来备受关注，因其具有鲁棒性和泛化能力。这里的总体思路是通过利用所有可用维度，在高维空间中基于训练数据构建分类器。SVM通过仅选择少数靠近分类器函数边界的观测值来实现巨大的数据压缩。然而，当观测值数量不是非常大（少）但维度/特征数量很大（多）时，在分类背景下并非所有可用特征都具有同等重要性。选择可用特征的有用部分可能会导致巨大的数据压缩。在本文中，我们提出一种算法方法，通过该方法可以选择这样一组特征。简而言之，我们将SVM传统的顺序观测选择策略转变为顺序特征选择策略。为实现这一点，我们修改了Zhu和Hastie（2005）在导入向量机（IVM）背景下提出的解决方案，以选择一个子维度模型来构建具有足够精度的最终分类器。

相似文献

Feature Import Vector Machine: A General Classifier with Flexible Feature Selection.

Stat Anal Data Min. 2015 Feb;8(1):49-63. doi: 10.1002/sam.11259. Epub 2015 Jan 26.

Effect of finite sample size on feature selection and classification: a simulation study.

Med Phys. 2010 Feb;37(2):907-20. doi: 10.1118/1.3284974.

Direct Kernel Perceptron (DKP): ultra-fast kernel ELM-based classification with non-iterative closed-form weight calculation.

Neural Netw. 2014 Feb;50:60-71. doi: 10.1016/j.neunet.2013.11.002. Epub 2013 Nov 14.

An Efficient Feature Selection Strategy Based on Multiple Support Vector Machine Technology with Gene Expression Data.

Biomed Res Int. 2018 Aug 30;2018:7538204. doi: 10.1155/2018/7538204. eCollection 2018.

Vicinal support vector classifier using supervised kernel-based clustering.

Artif Intell Med. 2014 Mar;60(3):189-96. doi: 10.1016/j.artmed.2014.01.003. Epub 2014 Feb 7.

Radiomics analysis for the differentiation of autoimmune pancreatitis and pancreatic ductal adenocarcinoma in F-FDG PET/CT.

Med Phys. 2019 Oct;46(10):4520-4530. doi: 10.1002/mp.13733. Epub 2019 Aug 13.

Seizure prediction using polynomial SVM classification.

Annu Int Conf IEEE Eng Med Biol Soc. 2015 Aug;2015:5748-51. doi: 10.1109/EMBC.2015.7319698.

Seminal quality prediction using data mining methods.

Technol Health Care. 2014;22(4):531-45. doi: 10.3233/THC-140816.

A support vector machine classifier reduces interscanner variation in the HRCT classification of regional disease pattern in diffuse lung disease: comparison to a Bayesian classifier.

Med Phys. 2013 May;40(5):051912. doi: 10.1118/1.4802214.

White box radial basis function classifiers with component selection for clinical prediction models.

Artif Intell Med. 2014 Jan;60(1):53-64. doi: 10.1016/j.artmed.2013.10.001. Epub 2013 Oct 18.

本文引用的文献

Penalized logistic regression for detecting gene interactions.

Biostatistics. 2008 Jan;9(1):30-50. doi: 10.1093/biostatistics/kxm010. Epub 2007 Apr 11.

The generalized LASSO.

IEEE Trans Neural Netw. 2004 Jan;15(1):16-28. doi: 10.1109/TNN.2003.809398.

Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction.

Bioinformatics. 2004 Nov 22;20(17):3185-95. doi: 10.1093/bioinformatics/bth383. Epub 2004 Jul 1.

Gene mining: a novel and powerful ensemble decision approach to hunting for disease genes using microarray expression profiling.

Nucleic Acids Res. 2004 May 17;32(9):2685-94. doi: 10.1093/nar/gkh563. Print 2004.

RankGene: identification of diagnostic genes based on expression data.

Bioinformatics. 2003 Aug 12;19(12):1578-9. doi: 10.1093/bioinformatics/btg179.

Coupled two-way clustering analysis of breast cancer and colon cancer gene expression data.

Bioinformatics. 2003 Jun 12;19(9):1079-89. doi: 10.1093/bioinformatics/btf876.

Predicting the clinical status of human breast cancer by using gene expression profiles.

Proc Natl Acad Sci U S A. 2001 Sep 25;98(20):11462-7. doi: 10.1073/pnas.201162998. Epub 2001 Sep 18.

Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.

Proc Natl Acad Sci U S A. 1999 Jun 8;96(12):6745-50. doi: 10.1073/pnas.96.12.6745.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

特征导入向量机：一种具有灵活特征选择的通用分类器。

Feature Import Vector Machine: A General Classifier with Flexible Feature Selection.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献