Ghosh Abhik, Thoresen Magne
Interdisciplinary Statistical Research Unit, Indian Statistical Institute, Kolkata, India.
Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway.
Stat Methods Med Res. 2021 Aug;30(8):1816-1832. doi: 10.1177/09622802211017299. Epub 2021 May 30.
Variable selection in ultra-high dimensional regression problems has become an important issue. In such situations, penalized regression models may face computational problems and some pre-screening of the variables may be necessary. A number of procedures for such pre-screening has been developed; among them the Sure Independence Screening (SIS) enjoys some popularity. However, SIS is vulnerable to outliers in the data, and in particular in small samples this may lead to faulty inference. In this paper, we develop a new robust screening procedure. We build on the density power divergence (DPD) estimation approach and introduce DPD-SIS and its extension iterative DPD-SIS. We illustrate the behavior of the methods through extensive simulation studies and show that they are superior to both the original SIS and other robust methods when there are outliers in the data. Finally, we illustrate its use in a study on regulation of lipid metabolism.
超高维回归问题中的变量选择已成为一个重要问题。在这种情况下,惩罚回归模型可能会面临计算问题,因此可能需要对变量进行一些预筛选。已经开发了许多用于这种预筛选的程序;其中,确定性独立筛选(SIS)颇受青睐。然而,SIS 容易受到数据中异常值的影响,特别是在小样本中,这可能导致错误的推断。在本文中,我们开发了一种新的稳健筛选程序。我们基于密度功率散度(DPD)估计方法,引入了 DPD-SIS 及其扩展的迭代 DPD-SIS。我们通过广泛的模拟研究说明了这些方法的性能,并表明当数据中存在异常值时,它们优于原始的 SIS 和其他稳健方法。最后,我们说明了其在脂质代谢调节研究中的应用。