Silin Igor, Fan Jianqing
Princeton University.
Ann Stat. 2022 Feb;50(1):460-486. doi: 10.1214/21-aos2116. Epub 2022 Feb 16.
We consider a high-dimensional linear regression problem. Unlike many papers on the topic, we do not require sparsity of the regression coefficients; instead, our main structural assumption is a decay of eigenvalues of the covariance matrix of the data. We propose a new family of estimators, called the canonical thresholding estimators, which pick largest regression coefficients in the canonical form. The estimators admit an explicit form and can be linked to LASSO and Principal Component Regression (PCR). A theoretical analysis for both fixed design and random design settings is provided. Obtained bounds on the mean squared error and the prediction error of a specific estimator from the family allow to clearly state sufficient conditions on the decay of eigenvalues to ensure convergence. In addition, we promote the use of the relative errors, strongly linked with the out-of-sample . The study of these relative errors leads to a new concept of joint effective dimension, which incorporates the covariance of the data and the regression coefficients simultaneously, and describes the complexity of a linear regression problem. Some minimax lower bounds are established to showcase the optimality of our procedure. Numerical simulations confirm good performance of the proposed estimators compared to the previously developed methods.
我们考虑一个高维线性回归问题。与许多关于该主题的论文不同,我们不要求回归系数具有稀疏性;相反,我们的主要结构假设是数据协方差矩阵的特征值衰减。我们提出了一类新的估计器,称为规范阈值估计器,它选择规范形式下最大的回归系数。这些估计器具有显式形式,并且可以与套索回归(LASSO)和主成分回归(PCR)联系起来。我们提供了固定设计和随机设计设置下的理论分析。从该类中获得的特定估计器的均方误差和预测误差的界,使得能够清晰地陈述特征值衰减的充分条件以确保收敛。此外,我们提倡使用与样本外误差紧密相关的相对误差。对这些相对误差的研究引出了联合有效维数的新概念,它同时纳入了数据的协方差和回归系数,并描述了线性回归问题的复杂性。我们建立了一些极小极大下界以展示我们方法的最优性。数值模拟证实了与先前开发的方法相比,所提出的估计器具有良好的性能。