Department of Statistics, University of Peshawar, Peshawar, Pakistan.
Department of Mathematics, College of Science Al-Zulfi, Majmaah University, Al-Majmaah, 11952, Saudi Arabia.
Sci Rep. 2022 Jun 29;12(1):10992. doi: 10.1038/s41598-022-14398-1.
Outlying observations have a large influence on the linear model selection process. In this article, we present a novel approach to robust model selection in linear regression to accommodate the situations where outliers are present in the data. The model selection criterion is based on two components, the robust conditional expected prediction loss, and a robust goodness-of-fit with a penalty term. We estimate the conditional expected prediction loss by using the out-of-bag stratified bootstrap approach. In the presence of outliers, the stratified bootstrap ensures that we obtain bootstrap samples that are similar to the original sample data. Furthermore, to control the undue effect of outliers, we use the robust MM-estimator and a bounded loss function in the proposed criterion. Specifically, we observe that instead of minimizing the penalized loss function or the conditional expected prediction loss separately, it is better to minimize them simultaneously. The simulation and real-data based studies confirm the consistent and satisfactory behavior of our bootstrap model selection procedure in the presence of response outliers and covariate outliers.
离群观测值对线性模型选择过程有很大的影响。在本文中,我们提出了一种新的稳健线性回归模型选择方法,以适应数据中存在异常值的情况。模型选择标准基于两个组成部分,稳健条件期望预测损失和带有惩罚项的稳健拟合优度。我们通过使用离群值分层引导抽样方法来估计条件期望预测损失。在存在异常值的情况下,分层引导抽样确保我们获得与原始样本数据相似的引导抽样。此外,为了控制异常值的不当影响,我们在提出的标准中使用了稳健的 MM 估计量和有界损失函数。具体来说,我们观察到,与其分别最小化惩罚损失函数或条件期望预测损失,不如同时最小化它们。模拟和基于实际数据的研究证实了我们的引导模型选择程序在响应异常值和协变量异常值存在时具有一致和令人满意的性能。