基于最小协方差行列式权重的分位数回归中的变量选择与正则化

Variable Selection and Regularization in Quantile Regression via Minimum Covariance Determinant Based Weights.

作者信息

Ranganai Edmore, Mudhombo Innocent

机构信息

Department of Statistics, University of South Africa, Florida Campus, Private Bag X6, Florida Park, Roodepoort 1710, South Africa.

Department of Accountancy, Vaal University of Technology, Vanderbijlpark Campus, Vanderbijlpark 1900, South Africa.

出版信息

Entropy (Basel). 2020 Dec 29;23(1):33. doi: 10.3390/e23010033.

DOI:10.3390/e23010033

PMID:33383623

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7823782/

Abstract

The importance of variable selection and regularization procedures in multiple regression analysis cannot be overemphasized. These procedures are adversely affected by predictor space data aberrations as well as outliers in the response space. To counter the latter, robust statistical procedures such as quantile regression which generalizes the well-known least absolute deviation procedure to all quantile levels have been proposed in the literature. Quantile regression is robust to response variable outliers but very susceptible to outliers in the predictor space (high leverage points) which may alter the eigen-structure of the predictor matrix. High leverage points that alter the eigen-structure of the predictor matrix by creating or hiding collinearity are referred to as collinearity influential points. In this paper, we suggest generalizing the penalized weighted least absolute deviation to all quantile levels, i.e., to penalized weighted quantile regression using the RIDGE, LASSO, and elastic net penalties as a remedy against collinearity influential points and high leverage points in general. To maintain robustness, we make use of very robust weights based on the computationally intensive high breakdown minimum covariance determinant. Simulations and applications to well-known data sets from the literature show an improvement in variable selection and regularization due to the robust weighting formulation.

摘要

在多元回归分析中，变量选择和正则化程序的重要性无论怎么强调都不为过。这些程序会受到预测变量空间数据畸变以及响应空间中的异常值的不利影响。为应对后者，文献中提出了诸如分位数回归等稳健统计程序，它将著名的最小绝对偏差程序推广到所有分位数水平。分位数回归对响应变量异常值具有稳健性，但对预测变量空间中的异常值（高杠杆点）非常敏感，这些异常值可能会改变预测矩阵的特征结构。通过创建或隐藏共线性来改变预测矩阵特征结构的高杠杆点被称为共线性影响点。在本文中，我们建议将惩罚加权最小绝对偏差推广到所有分位数水平，即使用岭回归（RIDGE）、套索回归（LASSO）和弹性网络惩罚进行惩罚加权分位数回归，作为针对共线性影响点和一般高杠杆点的一种补救措施。为保持稳健性，我们基于计算密集型的高崩溃最小协方差行列式使用非常稳健的权重。对文献中著名数据集的模拟和应用表明，由于稳健加权公式，变量选择和正则化有了改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f51d/7823782/762bdda5d68b/entropy-23-00033-g001.jpg

相似文献

Variable Selection and Regularization in Quantile Regression via Minimum Covariance Determinant Based Weights.

Entropy (Basel). 2020 Dec 29;23(1):33. doi: 10.3390/e23010033.

Robust scalar-on-function partial quantile regression.

J Appl Stat. 2023 Apr 19;51(7):1359-1377. doi: 10.1080/02664763.2023.2202464. eCollection 2024.

Outlier detection and robust variable selection via the penalized weighted LAD-LASSO method.

J Appl Stat. 2020 Feb 4;48(2):234-246. doi: 10.1080/02664763.2020.1722079. eCollection 2021.

ADAPTIVE ROBUST VARIABLE SELECTION.

Ann Stat. 2014 Feb 1;42(1):324-351. doi: 10.1214/13-AOS1191.

Penalized weighted smoothed quantile regression for high-dimensional longitudinal data.

Stat Med. 2024 May 10;43(10):2007-2042. doi: 10.1002/sim.10056. Epub 2024 Mar 8.

The spike-and-slab quantile LASSO for robust variable selection in cancer genomics studies.

Stat Med. 2024 Nov 20;43(26):4928-4983. doi: 10.1002/sim.10196. Epub 2024 Sep 11.

Quantile regression shrinkage and selection via the Lqsso.

J Biopharm Stat. 2024 May;34(3):297-322. doi: 10.1080/10543406.2023.2198593. Epub 2023 Apr 9.

New Gibbs sampling methods for bayesian regularized quantile regression.

Comput Biol Med. 2019 Jul;110:52-65. doi: 10.1016/j.compbiomed.2019.05.011. Epub 2019 May 16.

Regularized Quantile Regression and Robust Feature Screening for Single Index Models.

Stat Sin. 2016 Jan;26(1):69-95. doi: 10.5705/ss.2014.049.

Variable selection for ultra-high dimensional quantile regression with missing data and measurement error.

Stat Methods Med Res. 2021 Jan;30(1):129-150. doi: 10.1177/0962280220941533. Epub 2020 Aug 3.

引用本文的文献

A reliable prognostic model for hepatocellular carcinoma using neutrophil extracellular traps and immune related genes.

Sci Rep. 2025 Jun 3;15(1):19390. doi: 10.1038/s41598-025-01335-1.

Predictive modeling of COVID-19 mortality risk in chronic kidney disease patients using multiple machine learning algorithms.

Sci Rep. 2024 Nov 6;14(1):26979. doi: 10.1038/s41598-024-78498-w.

Regarding: LASSO-derived model for the prediction of lean-non-alcoholic fatty liver disease in examinees attending a routine health check-up.

Ann Med. 2024 Dec;56(1):2350628. doi: 10.1080/07853890.2024.2350628. Epub 2024 May 10.

Identification and verification of diagnostic biomarkers based on mitochondria-related genes related to immune microenvironment for preeclampsia using machine learning algorithms.

Front Immunol. 2024 Jan 8;14:1304165. doi: 10.3389/fimmu.2023.1304165. eCollection 2023.

Ensemble Linear Subspace Analysis of High-Dimensional Data.

Entropy (Basel). 2021 Mar 9;23(3):324. doi: 10.3390/e23030324.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于最小协方差行列式权重的分位数回归中的变量选择与正则化

Variable Selection and Regularization in Quantile Regression via Minimum Covariance Determinant Based Weights.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献