Li Runze, Liu Jingyuan, Lou Lejia
Pennsylvania State University, Xiamen University and Ernst & Young.
Stat Sin. 2017 Jul;27(3):983-996. doi: 10.5705/ss.202015.0473.
Partial correlation based variable selection method was proposed for normal linear regression models by Bühlmann, Kalisch and Maathuis (2010) as a comparable alternative method to regularization methods for variable selection. This paper addresses two important issues related to partial correlation based variable selection method: (a) whether this method is sensitive to normality assumption, and (b) whether this method is valid when the dimension of predictor increases in an exponential rate of the sample size. To address issue (a), we systematically study this method for elliptical linear regression models. Our finding indicates that the original proposal may lead to inferior performance when the marginal kurtosis of predictor is not close to that of normal distribution. Our simulation results further confirm this finding. To ensure the superior performance of partial correlation based variable selection procedure, we propose a thresholded partial correlation (TPC) approach to select significant variables in linear regression models. We establish the selection consistency of the TPC in the presence of ultrahigh dimensional predictors. Since the TPC procedure includes the original proposal as a special case, our theoretical results address the issue (b) directly. As a by-product, the sure screening property of the first step of TPC was obtained. The numerical examples also illustrate that the TPC is competitively comparable to the commonly-used regularization methods for variable selection.
布尔曼、卡利施和马图斯(2010年)针对正态线性回归模型提出了基于偏相关的变量选择方法,作为变量选择正则化方法的一种可比替代方法。本文讨论了与基于偏相关的变量选择方法相关的两个重要问题:(a)该方法是否对正态性假设敏感,以及(b)当预测变量的维度以样本量的指数速率增加时该方法是否有效。为了解决问题(a),我们系统地研究了椭圆线性回归模型的这种方法。我们的发现表明,当预测变量的边际峰度与正态分布的边际峰度不接近时,原始方法可能导致性能较差。我们的模拟结果进一步证实了这一发现。为了确保基于偏相关的变量选择过程具有优越的性能,我们提出了一种阈值化偏相关(TPC)方法来选择线性回归模型中的显著变量。我们在存在超高维预测变量情况下建立了TPC的选择一致性。由于TPC过程将原始方法作为一种特殊情况包含在内,我们的理论结果直接解决了问题(b)。作为一个副产品,我们得到了TPC第一步的确定筛选性质。数值例子也表明,TPC与常用的变量选择正则化方法相比具有竞争力。