非稀疏高维线性回归的标准阈值法

CANONICAL THRESHOLDING FOR NON-SPARSE HIGH-DIMENSIONAL LINEAR REGRESSION.

作者信息

Silin Igor, Fan Jianqing

机构信息

Princeton University.

出版信息

Ann Stat. 2022 Feb;50(1):460-486. doi: 10.1214/21-aos2116. Epub 2022 Feb 16.

DOI:10.1214/21-aos2116

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9491498/

Abstract

We consider a high-dimensional linear regression problem. Unlike many papers on the topic, we do not require sparsity of the regression coefficients; instead, our main structural assumption is a decay of eigenvalues of the covariance matrix of the data. We propose a new family of estimators, called the canonical thresholding estimators, which pick largest regression coefficients in the canonical form. The estimators admit an explicit form and can be linked to LASSO and Principal Component Regression (PCR). A theoretical analysis for both fixed design and random design settings is provided. Obtained bounds on the mean squared error and the prediction error of a specific estimator from the family allow to clearly state sufficient conditions on the decay of eigenvalues to ensure convergence. In addition, we promote the use of the relative errors, strongly linked with the out-of-sample . The study of these relative errors leads to a new concept of joint effective dimension, which incorporates the covariance of the data and the regression coefficients simultaneously, and describes the complexity of a linear regression problem. Some minimax lower bounds are established to showcase the optimality of our procedure. Numerical simulations confirm good performance of the proposed estimators compared to the previously developed methods.

摘要

我们考虑一个高维线性回归问题。与许多关于该主题的论文不同，我们不要求回归系数具有稀疏性；相反，我们的主要结构假设是数据协方差矩阵的特征值衰减。我们提出了一类新的估计器，称为规范阈值估计器，它选择规范形式下最大的回归系数。这些估计器具有显式形式，并且可以与套索回归（LASSO）和主成分回归（PCR）联系起来。我们提供了固定设计和随机设计设置下的理论分析。从该类中获得的特定估计器的均方误差和预测误差的界，使得能够清晰地陈述特征值衰减的充分条件以确保收敛。此外，我们提倡使用与样本外误差紧密相关的相对误差。对这些相对误差的研究引出了联合有效维数的新概念，它同时纳入了数据的协方差和回归系数，并描述了线性回归问题的复杂性。我们建立了一些极小极大下界以展示我们方法的最优性。数值模拟证实了与先前开发的方法相比，所提出的估计器具有良好的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d95e/9491498/afeb1fae8149/nihms-1782574-f0001.jpg

相似文献

1

CANONICAL THRESHOLDING FOR NON-SPARSE HIGH-DIMENSIONAL LINEAR REGRESSION.非稀疏高维线性回归的标准阈值法

Ann Stat. 2022 Feb;50(1):460-486. doi: 10.1214/21-aos2116. Epub 2022 Feb 16.

2

Large Covariance Estimation by Thresholding Principal Orthogonal Complements.通过阈值化主正交补进行大协方差估计

J R Stat Soc Series B Stat Methodol. 2013 Sep 1;75(4). doi: 10.1111/rssb.12016.

3

Cross-Validated Loss-Based Covariance Matrix Estimator Selection in High Dimensions.高维中基于交叉验证损失的协方差矩阵估计器选择

J Comput Graph Stat. 2023;32(2):601-612. doi: 10.1080/10618600.2022.2110883. Epub 2022 Oct 7.

4

Sparse Group Lasso: Optimal Sample Complexity, Convergence Rate, and Statistical Inference.稀疏组套索：最优样本复杂度、收敛速度与统计推断

IEEE Trans Inf Theory. 2022 Sep;68(9):5975-6002. doi: 10.1109/tit.2022.3175455. Epub 2022 May 16.

5

Shrinkage estimators for covariance matrices.协方差矩阵的收缩估计量。

Biometrics. 2001 Dec;57(4):1173-84. doi: 10.1111/j.0006-341x.2001.01173.x.

6

MINIMAX BOUNDS FOR SPARSE PCA WITH NOISY HIGH-DIMENSIONAL DATA.含噪声高维数据的稀疏主成分分析的极小极大界

Ann Stat. 2013 Jun;41(3):1055-1084. doi: 10.1214/12-AOS1014.

7

Minimax Rate-optimal Estimation of High-dimensional Covariance Matrices with Incomplete Data.具有不完全数据的高维协方差矩阵的极小极大速率最优估计

J Multivar Anal. 2016 Sep;150:55-74. doi: 10.1016/j.jmva.2016.05.002. Epub 2016 May 19.

8

Minimax Rates of -Losses for High-Dimensional Linear Errors-in-Variables Models over -Balls.高维线性变量误差模型在$\ell_2$球上的极小极大$\ell_2$损失率。

Entropy (Basel). 2021 Jun 5;23(6):722. doi: 10.3390/e23060722.

9

Convex Banding of the Covariance Matrix.协方差矩阵的凸带形

J Am Stat Assoc. 2016;111(514):834-845. doi: 10.1080/01621459.2015.1058265. Epub 2016 Aug 18.

10

On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces.关于深度学习神经网络在稀疏参数空间中的极大极小最优性和优越性。

Neural Netw. 2020 Mar;123:343-361. doi: 10.1016/j.neunet.2019.12.014. Epub 2019 Dec 23.

本文引用的文献

1

SURPRISES IN HIGH-DIMENSIONAL RIDGELESS LEAST SQUARES INTERPOLATION.高维无脊最小二乘插值中的意外情况。

Ann Stat. 2022 Apr;50(2):949-986. doi: 10.1214/21-aos2133. Epub 2022 Apr 7.

2

Benign overfitting in linear regression.线性回归中的良性过拟合。

Proc Natl Acad Sci U S A. 2020 Dec 1;117(48):30063-30070. doi: 10.1073/pnas.1907378117. Epub 2020 Apr 24.

3

Factor-Adjusted Regularized Model Selection.因子调整正则化模型选择

J Econom. 2020 May;216(1):71-85. doi: 10.1016/j.jeconom.2020.01.006. Epub 2020 Feb 7.

4

Sufficient Forecasting Using Factor Models.使用因子模型进行充分预测。

J Econom. 2017 Dec;201(2):292-306. doi: 10.1016/j.jeconom.2017.08.009. Epub 2017 Aug 26.

5

SLOPE-ADAPTIVE VARIABLE SELECTION VIA CONVEX OPTIMIZATION.通过凸优化实现斜率自适应变量选择

Ann Appl Stat. 2015;9(3):1103-1140. doi: 10.1214/15-AOAS842.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验