College of Statistical and Actuarial Sciences, University of the Punjab, Lahore, Pakistan.
Department of Civil Engineering, College of Engineering, King Khalid University, Abha, KSA.
PLoS One. 2021 Nov 22;16(11):e0259991. doi: 10.1371/journal.pone.0259991. eCollection 2021.
The problem of multicollinearity in multiple linear regression models arises when the predictor variables are correlated among each other. The variance of the ordinary least squared estimator become unstable in such situation. In order to mitigate the problem of multicollinearity, Liu regression is widely used as a biased method of estimation with shrinkage parameter 'd'. The optimal value of shrinkage parameter plays a vital role in bias-variance trade-off.
Several estimators are available in literature for the estimation of shrinkage parameter. But the existing estimators do not perform well in terms of smaller mean squared error when the problem of multicollinearity is high or severe.
In this paper, some new estimators for the shrinkage parameter are proposed. The proposed estimators are the class of estimators that are based on quantile of the regression coefficients. The performance of the new estimators is compared with the existing estimators through Monte Carlo simulation. Mean squared error and mean absolute error is considered as evaluation criteria of the estimators. Tobacco dataset is used as an application to illustrate the benefits of the new estimators and support the simulation results.
The new estimators outperform the existing estimators in most of the considered scenarios including high and severe cases of multicollinearity. 95% mean prediction interval of all the estimators is also computed for the Tobacco data. The new estimators give the best mean prediction interval among all other estimators.
We recommend the use of new estimators to practitioners when the problem of high to severe multicollinearity exists among the predictor variables.
当预测变量相互关联时,多元线性回归模型中会出现多重共线性问题。在这种情况下,普通最小二乘估计量的方差变得不稳定。为了减轻多重共线性问题,Liu 回归作为一种具有收缩参数“d”的有偏估计方法被广泛应用。收缩参数的最优值在偏差-方差权衡中起着至关重要的作用。
文献中提供了几种用于估计收缩参数的估计器。但是,当存在高度或严重的多重共线性问题时,现有的估计器在较小的均方误差方面表现不佳。
本文提出了一些新的收缩参数估计器。所提出的估计器是基于回归系数分位数的一类估计器。通过蒙特卡罗模拟比较了新估计器与现有估计器的性能。均方误差和平均绝对误差被视为估计器的评价标准。烟草数据集被用作应用程序,以说明新估计器的优势并支持模拟结果。
在包括高度和严重多重共线性的大多数情况下,新估计器在大多数考虑的情况下都优于现有估计器。还为烟草数据计算了所有估计器的 95%均值预测区间。在所有其他估计器中,新估计器给出了最佳的均值预测区间。
当预测变量之间存在高度到严重的多重共线性问题时,我们建议从业者使用新的估计器。