Suppr超能文献

基于正则化张量积核的稀疏非参数回归

Sparse Nonparametric Regression With Regularized Tensor Product Kernel.

作者信息

Yu Hang, Wang Yuanjia, Zeng Donglin

机构信息

Department of Statistics and Operation Research, University of North Carolina at Chapel Hill, North Carolina, United State.

Department of Biostatistics, Mailman School of Public Health, Columbia University, United State.

出版信息

Stat (Int Stat Inst). 2020;9(1). doi: 10.1002/sta4.300. Epub 2020 Jul 6.

Abstract

With growing interest to use black-box machine learning for complex data with many feature variables, it is critical to obtain a prediction model that only depends on a small set of features to maximize generalizability. Therefore, feature selection remains to be an important and challenging problem in modern applications. Most of existing methods for feature selection are based on either parametric or semiparametric models, so the resulting performance can severely suffer from model misspecification when high-order nonlinear interactions among the features are present. A very limited number of approaches for nonparametric feature selection were proposed, but they are computationally intensive and may not even converge. In this paper, we propose a novel and computationally efficient approach for nonparametric feature selection in regression field based on a tensor-product kernel function over the feature space. The importance of each feature is governed by a parameter in the kernel function which can be efficiently computed iteratively from a modified alternating direction method of multipliers (ADMM) algorithm. We prove the oracle selection property of the proposed method. Finally, we demonstrate the superior performance of our approach compared to existing methods via simulation studies and application to the prediction of Alzheimer's disease.

摘要

随着人们对将黑箱机器学习用于具有许多特征变量的复杂数据的兴趣日益增加,获得一个仅依赖于一小部分特征以最大化通用性的预测模型至关重要。因此,特征选择在现代应用中仍然是一个重要且具有挑战性的问题。现有的大多数特征选择方法基于参数模型或半参数模型,因此当特征之间存在高阶非线性相互作用时,所得性能可能会因模型误设而严重受损。提出的非参数特征选择方法数量非常有限,但它们计算量大,甚至可能不收敛。在本文中,我们基于特征空间上的张量积核函数,提出了一种用于回归领域非参数特征选择的新颖且计算高效的方法。每个特征的重要性由核函数中的一个参数控制,该参数可以通过修改后的交替方向乘子法(ADMM)算法迭代有效地计算出来。我们证明了所提方法的最优选择性质。最后,通过模拟研究以及在阿尔茨海默病预测中的应用,我们展示了我们的方法与现有方法相比的优越性能。

相似文献

1
Sparse Nonparametric Regression With Regularized Tensor Product Kernel.
Stat (Int Stat Inst). 2020;9(1). doi: 10.1002/sta4.300. Epub 2020 Jul 6.
2
A general framework of nonparametric feature selection in high-dimensional data.
Biometrics. 2023 Jun;79(2):951-963. doi: 10.1111/biom.13664. Epub 2022 Apr 7.
3
QC-ODKLA: Quantized and Communication- Censored Online Decentralized Kernel Learning via Linearized ADMM.
IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):17987-17999. doi: 10.1109/TNNLS.2023.3310499. Epub 2024 Dec 2.
4
White box radial basis function classifiers with component selection for clinical prediction models.
Artif Intell Med. 2014 Jan;60(1):53-64. doi: 10.1016/j.artmed.2013.10.001. Epub 2013 Oct 18.
5
Joint semiparametric kernel network regression.
Stat Med. 2023 Dec 10;42(28):5247-5265. doi: 10.1002/sim.9910. Epub 2023 Sep 19.
6
Double Sparsity Kernel Learning with Automatic Variable Selection and Data Extraction.
Stat Interface. 2018;11(3):401-420. doi: 10.4310/SII.2018.v11.n3.a1.
7
Multitask Feature Learning Meets Robust Tensor Decomposition for EEG Classification.
IEEE Trans Cybern. 2021 Apr;51(4):2242-2252. doi: 10.1109/TCYB.2019.2946914. Epub 2021 Mar 17.
8
Out-of-Sample Extensions for Non-Parametric Kernel Methods.
IEEE Trans Neural Netw Learn Syst. 2017 Feb;28(2):334-345. doi: 10.1109/TNNLS.2015.2512277. Epub 2016 Jan 6.
9
Tensor Robust Kernel PCA for Multidimensional Data.
IEEE Trans Neural Netw Learn Syst. 2025 Feb;36(2):2662-2674. doi: 10.1109/TNNLS.2024.3356228. Epub 2025 Feb 6.

本文引用的文献

1
FEATURE ELIMINATION IN KERNEL MACHINES IN MODERATELY HIGH DIMENSIONS.
Ann Stat. 2019 Feb;47(1):497-526. doi: 10.1214/18-AOS1696.
3
Nonparametric Independence Screening in Sparse Ultra-High Dimensional Additive Models.
J Am Stat Assoc. 2011 Jun;106(494):544-557. doi: 10.1198/jasa.2011.tm09779.
4
Discussion of "Sure Independence Screening for Ultra-High Dimensional Feature Space.
J R Stat Soc Series B Stat Methodol. 2008 Nov;70(5):903. doi: 10.1111/j.1467-9868.2008.00674.x.
5
Using the Fisher kernel method to detect remote protein homologies.
Proc Int Conf Intell Syst Mol Biol. 1999:149-58.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验