Department of Mathematical and Statistical Sciences, University of Alberta, Alberta, Canada.
Stat Methods Med Res. 2019 Jul;28(7):2210-2226. doi: 10.1177/0962280218757560. Epub 2018 Feb 16.
We consider the problem of estimation and variable selection for general linear regression models. Regularized regression procedures have been widely used for variable selection, but most existing methods perform poorly in the presence of outliers. We construct a new penalized procedure that simultaneously attains full efficiency and maximum robustness. Furthermore, the proposed procedure satisfies the oracle properties. The new procedure is designed to achieve sparse and robust solutions by imposing adaptive weights on both the decision loss and the penalty function. The proposed method of estimation and variable selection attains full efficiency when the model is correct and, at the same time, achieves maximum robustness when outliers are present. We examine the robustness properties using the finite-sample breakdown point and an influence function. We show that the proposed estimator attains the maximum breakdown point. Furthermore, there is no loss in efficiency when there are no outliers or the error distribution is normal. For practical implementation of the proposed method, we present a computational algorithm. We examine the finite-sample and robustness properties using Monte Carlo studies. Two datasets are also analyzed.
我们考虑了一般线性回归模型的估计和变量选择问题。正则化回归方法已被广泛用于变量选择,但大多数现有方法在存在异常值时表现不佳。我们构建了一种新的惩罚性程序,该程序同时实现了完全效率和最大稳健性。此外,所提出的程序满足 oracle 属性。通过对决策损失和惩罚函数施加自适应权重,新程序旨在实现稀疏和稳健的解决方案。当模型正确时,所提出的估计和变量选择方法可以达到完全效率,并且在存在异常值时可以达到最大稳健性。我们使用有限样本击穿点和影响函数来检查稳健性属性。我们表明,所提出的估计器达到了最大击穿点。此外,当不存在异常值或误差分布正常时,效率没有损失。为了实际实施所提出的方法,我们提出了一种计算算法。我们使用蒙特卡罗研究来检查有限样本和稳健性属性。还分析了两个数据集。