Wang Xueqin, Jiang Yunlu, Huang Mian, Zhang Heping
Department of Statistical Science, School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, 510275, China; and Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China; and Xinhua College, Sun Yat-Sen University, Guangzhou, 510520, China.
J Am Stat Assoc. 2013 Apr 1;108(502):632-643. doi: 10.1080/01621459.2013.766613.
Robust variable selection procedures through penalized regression have been gaining increased attention in the literature. They can be used to perform variable selection and are expected to yield robust estimates. However, to the best of our knowledge, the robustness of those penalized regression procedures has not been well characterized. In this paper, we propose a class of penalized robust regression estimators based on exponential squared loss. The motivation for this new procedure is that it enables us to characterize its robustness that has not been done for the existing procedures, while its performance is near optimal and superior to some recently developed methods. Specifically, under defined regularity conditions, our estimators are [Formula: see text] and possess the oracle property. Importantly, we show that our estimators can achieve the highest asymptotic breakdown point of 1/2 and that their influence functions are bounded with respect to the outliers in either the response or the covariate domain. We performed simulation studies to compare our proposed method with some recent methods, using the oracle method as the benchmark. We consider common sources of influential points. Our simulation studies reveal that our proposed method performs similarly to the oracle method in terms of the model error and the positive selection rate even in the presence of influential points. In contrast, other existing procedures have a much lower non-causal selection rate. Furthermore, we re-analyze the Boston Housing Price Dataset and the Plasma Beta-Carotene Level Dataset that are commonly used examples for regression diagnostics of influential points. Our analysis unravels the discrepancies of using our robust method versus the other penalized regression method, underscoring the importance of developing and applying robust penalized regression methods.
通过惩罚回归进行稳健变量选择的方法在文献中越来越受到关注。它们可用于进行变量选择,并有望产生稳健的估计。然而,据我们所知,这些惩罚回归方法的稳健性尚未得到很好的刻画。在本文中,我们提出了一类基于指数平方损失的惩罚稳健回归估计器。这种新方法的动机是,它使我们能够刻画其稳健性,而这是现有方法尚未做到的,同时其性能接近最优,且优于一些最近开发的方法。具体而言,在定义的正则条件下,我们的估计器是[公式:见原文],并具有神谕性质。重要的是,我们表明我们的估计器可以达到最高的渐近崩溃点1/2,并且它们的影响函数相对于响应或协变量域中的异常值是有界的。我们进行了模拟研究,以我们提出的方法与一些最近的方法进行比较,以神谕方法作为基准。我们考虑了有影响点的常见来源。我们的模拟研究表明,即使在存在有影响点的情况下,我们提出的方法在模型误差和正选择率方面的表现与神谕方法相似。相比之下,其他现有方法的非因果选择率要低得多。此外,我们重新分析了波士顿房价数据集和血浆β-胡萝卜素水平数据集,这两个数据集是用于有影响点回归诊断的常用示例。我们的分析揭示了使用我们的稳健方法与其他惩罚回归方法之间的差异,强调了开发和应用稳健惩罚回归方法的重要性。