Suppr超能文献

适度高维下支持向量机的变量选择

Variable Selection for Support Vector Machines in Moderately High Dimensions.

作者信息

Zhang Xiang, Wu Yichao, Wang Lan, Li Runze

机构信息

North Carolina State University, Raleigh, NC, USA.

The University of Minnesota, Minneapolis, MN, USA.

出版信息

J R Stat Soc Series B Stat Methodol. 2016 Jan;78(1):53-76. doi: 10.1111/rssb.12100. Epub 2015 Jan 5.

Abstract

The support vector machine (SVM) is a powerful binary classification tool with high accuracy and great flexibility. It has achieved great success, but its performance can be seriously impaired if many redundant covariates are included. Some efforts have been devoted to studying variable selection for SVMs, but asymptotic properties, such as variable selection consistency, are largely unknown when the number of predictors diverges to infinity. In this work, we establish a unified theory for a general class of nonconvex penalized SVMs. We first prove that in ultra-high dimensions, there exists one local minimizer to the objective function of nonconvex penalized SVMs possessing the desired oracle property. We further address the problem of nonunique local minimizers by showing that the local linear approximation algorithm is guaranteed to converge to the oracle estimator even in the ultra-high dimensional setting if an appropriate initial estimator is available. This condition on initial estimator is verified to be automatically valid as long as the dimensions are moderately high. Numerical examples provide supportive evidence.

摘要

支持向量机(SVM)是一种强大的二元分类工具,具有高精度和高度灵活性。它已经取得了巨大成功,但如果包含许多冗余协变量,其性能可能会受到严重损害。已经有一些努力致力于研究支持向量机的变量选择,但当预测变量的数量趋于无穷大时,诸如变量选择一致性等渐近性质在很大程度上仍然未知。在这项工作中,我们为一类一般的非凸惩罚支持向量机建立了统一理论。我们首先证明,在超高维度下,非凸惩罚支持向量机的目标函数存在一个具有所需神谕性质的局部极小值点。我们进一步通过表明如果有一个合适的初始估计量,局部线性近似算法即使在超高维度设置下也能保证收敛到神谕估计量,从而解决了非唯一局部极小值点的问题。只要维度适度高,就可以验证初始估计量的这个条件自动成立。数值例子提供了支持性证据。

相似文献

1
Variable Selection for Support Vector Machines in Moderately High Dimensions.
J R Stat Soc Series B Stat Methodol. 2016 Jan;78(1):53-76. doi: 10.1111/rssb.12100. Epub 2015 Jan 5.
2
CALIBRATING NON-CONVEX PENALIZED REGRESSION IN ULTRA-HIGH DIMENSION.
Ann Stat. 2013 Oct 1;41(5):2505-2536. doi: 10.1214/13-AOS1159.
3
STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION.
Ann Stat. 2014 Jun;42(3):819-849. doi: 10.1214/13-aos1198.
4
A unified classification model based on robust optimization.
Neural Comput. 2013 Mar;25(3):759-804. doi: 10.1162/NECO_a_00412. Epub 2012 Dec 28.
5
ADAPTIVE ROBUST VARIABLE SELECTION.
Ann Stat. 2014 Feb 1;42(1):324-351. doi: 10.1214/13-AOS1191.
6
A few theoretical results for Laplace and arctan penalized ordinary least squares linear regression estimators.
Commun Stat Theory Methods. 2024;53(13):4819-4840. doi: 10.1080/03610926.2023.2195033. Epub 2023 Apr 4.
7
DC Algorithm for Extended Robust Support Vector Machine.
Neural Comput. 2017 May;29(5):1406-1438. doi: 10.1162/NECO_a_00958. Epub 2017 Mar 23.
8
Broken adaptive ridge regression and its asymptotic properties.
J Multivar Anal. 2018 Nov;168:334-351. doi: 10.1016/j.jmva.2018.08.007. Epub 2018 Aug 23.
9
Gene selection using support vector machines with non-convex penalty.
Bioinformatics. 2006 Jan 1;22(1):88-95. doi: 10.1093/bioinformatics/bti736. Epub 2005 Oct 25.
10
Structured sparse support vector machine with ordered features.
J Appl Stat. 2020 Nov 18;49(5):1105-1120. doi: 10.1080/02664763.2020.1849053. eCollection 2022.

引用本文的文献

1
Classification of Apricot Varieties by Infrared Spectroscopy and Machine Learning.
ACS Agric Sci Technol. 2025 Jul 8;5(7):1373-1381. doi: 10.1021/acsagscitech.5c00068. eCollection 2025 Jul 21.
3
Are Latent Factor Regression and Sparse Regression Adequate?
J Am Stat Assoc. 2024;119(546):1076-1088. doi: 10.1080/01621459.2023.2169700. Epub 2023 Feb 14.
4
Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions.
PLoS One. 2022 Sep 15;17(9):e0274440. doi: 10.1371/journal.pone.0274440. eCollection 2022.
5
Structured sparse support vector machine with ordered features.
J Appl Stat. 2020 Nov 18;49(5):1105-1120. doi: 10.1080/02664763.2020.1849053. eCollection 2022.
6
Sparse Multicategory Generalized Distance Weighted Discrimination in Ultra-High Dimensions.
Entropy (Basel). 2020 Nov 5;22(11):1257. doi: 10.3390/e22111257.
7
Knowledge-Guided Bayesian Support Vector Machine for High-Dimensional Data with Application to Analysis of Genomics Data.
Proc IEEE Int Conf Big Data. 2018 Dec;2018:1484-1493. doi: 10.1109/BigData.2018.8622484. Epub 2019 Jan 24.

本文引用的文献

1
STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION.
Ann Stat. 2014 Jun;42(3):819-849. doi: 10.1214/13-aos1198.
2
: Coordinate Descent With Nonconvex Penalties.
J Am Stat Assoc. 2011;106(495):1125-1138. doi: 10.1198/jasa.2011.tm09738.
3
CALIBRATING NON-CONVEX PENALIZED REGRESSION IN ULTRA-HIGH DIMENSION.
Ann Stat. 2013 Oct 1;41(5):2505-2536. doi: 10.1214/13-AOS1159.
4
Quantile Regression for Analyzing Heterogeneity in Ultra-high Dimension.
J Am Stat Assoc. 2012 Mar 1;107(497):214-222. doi: 10.1080/01621459.2012.656014. Epub 2012 Jun 11.
5
Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data.
BMC Bioinformatics. 2011 May 9;12:138. doi: 10.1186/1471-2105-12-138.
6
One-step Sparse Estimates in Nonconcave Penalized Likelihood Models.
Ann Stat. 2008 Aug 1;36(4):1509-1533. doi: 10.1214/009053607000000802.
7
Discussion of "Sure Independence Screening for Ultra-High Dimensional Feature Space.
J R Stat Soc Series B Stat Methodol. 2008 Nov;70(5):903. doi: 10.1111/j.1467-9868.2008.00674.x.
8
High Dimensional Classification Using Features Annealed Independence Rules.
Ann Stat. 2008;36(6):2605-2637. doi: 10.1214/07-AOS504.
9
Gene selection using support vector machines with non-convex penalty.
Bioinformatics. 2006 Jan 1;22(1):88-95. doi: 10.1093/bioinformatics/bti736. Epub 2005 Oct 25.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验