Ranganai Edmore, Mudhombo Innocent
Department of Statistics, University of South Africa, Florida Campus, Private Bag X6, Florida Park, Roodepoort 1710, South Africa.
Department of Accountancy, Vaal University of Technology, Vanderbijlpark Campus, Vanderbijlpark 1900, South Africa.
Entropy (Basel). 2020 Dec 29;23(1):33. doi: 10.3390/e23010033.
The importance of variable selection and regularization procedures in multiple regression analysis cannot be overemphasized. These procedures are adversely affected by predictor space data aberrations as well as outliers in the response space. To counter the latter, robust statistical procedures such as quantile regression which generalizes the well-known least absolute deviation procedure to all quantile levels have been proposed in the literature. Quantile regression is robust to response variable outliers but very susceptible to outliers in the predictor space (high leverage points) which may alter the eigen-structure of the predictor matrix. High leverage points that alter the eigen-structure of the predictor matrix by creating or hiding collinearity are referred to as collinearity influential points. In this paper, we suggest generalizing the penalized weighted least absolute deviation to all quantile levels, i.e., to penalized weighted quantile regression using the RIDGE, LASSO, and elastic net penalties as a remedy against collinearity influential points and high leverage points in general. To maintain robustness, we make use of very robust weights based on the computationally intensive high breakdown minimum covariance determinant. Simulations and applications to well-known data sets from the literature show an improvement in variable selection and regularization due to the robust weighting formulation.
在多元回归分析中,变量选择和正则化程序的重要性无论怎么强调都不为过。这些程序会受到预测变量空间数据畸变以及响应空间中的异常值的不利影响。为应对后者,文献中提出了诸如分位数回归等稳健统计程序,它将著名的最小绝对偏差程序推广到所有分位数水平。分位数回归对响应变量异常值具有稳健性,但对预测变量空间中的异常值(高杠杆点)非常敏感,这些异常值可能会改变预测矩阵的特征结构。通过创建或隐藏共线性来改变预测矩阵特征结构的高杠杆点被称为共线性影响点。在本文中,我们建议将惩罚加权最小绝对偏差推广到所有分位数水平,即使用岭回归(RIDGE)、套索回归(LASSO)和弹性网络惩罚进行惩罚加权分位数回归,作为针对共线性影响点和一般高杠杆点的一种补救措施。为保持稳健性,我们基于计算密集型的高崩溃最小协方差行列式使用非常稳健的权重。对文献中著名数据集的模拟和应用表明,由于稳健加权公式,变量选择和正则化有了改进。