自适应稳健回归

Adaptive Huber Regression.

作者信息

Sun Qiang, Zhou Wen-Xin, Fan Jianqing

机构信息

Department of Statistical Sciences, University of Toronto, Toronto, ON M5S 3G3, Canada.

Department of Mathematics, University of California, San Diego, La Jolla, CA 92093.

出版信息

J Am Stat Assoc. 2020;115(529):254-265. doi: 10.1080/01621459.2018.1543124. Epub 2019 Apr 22.

DOI:10.1080/01621459.2018.1543124

PMID:33139964

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7603940/

Abstract

Big data can easily be contaminated by outliers or contain variables with heavy-tailed distributions, which makes many conventional methods inadequate. To address this challenge, we propose the adaptive Huber regression for robust estimation and inference. The key observation is that the robustification parameter should adapt to the sample size, dimension and moments for optimal tradeoff between bias and robustness. Our theoretical framework deals with heavy-tailed distributions with bounded (1 + )-th moment for any > 0. We establish a sharp phase transition for robust estimation of regression parameters in both low and high dimensions: when ≥ 1, the estimator admits a sub-Gaussian-type deviation bound without sub-Gaussian assumptions on the data, while only a slower rate is available in the regime 0 < < 1 and the transition is smooth and optimal. In addition, we extend the methodology to allow both heavy-tailed predictors and observation noise. Simulation studies lend further support to the theory. In a genetic study of cancer cell lines that exhibit heavy-tailedness, the proposed methods are shown to be more robust and predictive.

摘要

大数据很容易受到异常值的污染，或者包含具有重尾分布的变量，这使得许多传统方法并不适用。为应对这一挑战，我们提出了用于稳健估计和推断的自适应Huber回归。关键的发现是，稳健化参数应适应样本大小、维度和矩，以便在偏差和稳健性之间实现最佳权衡。我们的理论框架处理对于任意>0具有有界(1 + )阶矩的重尾分布。我们在低维和高维中都为回归参数的稳健估计建立了一个清晰的相变：当≥1时，估计量在不对数据做次高斯假设的情况下具有次高斯型偏差界，而在0 << 1的情况下只有较慢的速率，并且这种转变是平滑且最优的。此外，我们扩展了该方法以同时允许重尾预测变量和观测噪声。模拟研究进一步支持了该理论。在一项对表现出重尾性的癌细胞系的遗传学研究中，所提出的方法被证明更稳健且具有预测性。

相似文献

Adaptive Huber Regression.自适应稳健回归

J Am Stat Assoc. 2020;115(529):254-265. doi: 10.1080/01621459.2018.1543124. Epub 2019 Apr 22.

Sparse Reduced Rank Huber Regression in High Dimensions.高维稀疏降秩Huber回归

J Am Stat Assoc. 2023;118(544):2383-2393. doi: 10.1080/01621459.2022.2050243. Epub 2022 Apr 15.

A NEW PERSPECTIVE ON ROBUST -ESTIMATION: FINITE SAMPLE THEORY AND APPLICATIONS TO DEPENDENCE-ADJUSTED MULTIPLE TESTING.稳健估计的新视角：有限样本理论及其在相关性调整多重检验中的应用

Ann Stat. 2018 Oct;46(5):1904-1931. doi: 10.1214/17-AOS1606. Epub 2018 Aug 17.

Adaptive Huber Regression on Markov-dependent Data.基于马尔可夫相关数据的自适应Huber回归

Stoch Process Their Appl. 2022 Aug;150:802-818. doi: 10.1016/j.spa.2019.09.004. Epub 2019 Sep 25.

Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions.在不存在对称性和轻尾假设的情况下对高维均值回归进行估计。

J R Stat Soc Series B Stat Methodol. 2017 Jan;79(1):247-265. doi: 10.1111/rssb.12166. Epub 2016 Apr 14.

A SHRINKAGE PRINCIPLE FOR HEAVY-TAILED DATA: HIGH-DIMENSIONAL ROBUST LOW-RANK MATRIX RECOVERY.重尾数据的收缩原理：高维稳健低秩矩阵恢复

Ann Stat. 2021 Jun;49(3):1239-1266. doi: 10.1214/20-aos1980. Epub 2021 Aug 9.

Robust High-dimensional Volatility Matrix Estimation for High-Frequency Factor Model.用于高频因子模型的稳健高维波动率矩阵估计

J Am Stat Assoc. 2018;113(523):1268-1283. doi: 10.1080/01621459.2017.1340888. Epub 2018 Oct 8.

Minimax Optimal Bandits for Heavy Tail Rewards.重尾奖励的极小极大最优策略

IEEE Trans Neural Netw Learn Syst. 2024 Apr;35(4):5280-5294. doi: 10.1109/TNNLS.2022.3203035. Epub 2024 Apr 4.

Robust Differential Abundance Analysis of Microbiome Sequencing Data.微生物组测序数据的稳健差异丰度分析。

Genes (Basel). 2023 Oct 26;14(11):2000. doi: 10.3390/genes14112000.

Robust Estimation of Transition Matrices in High Dimensional Heavy-tailed Vector Autoregressive Processes.高维重尾向量自回归过程中转移矩阵的稳健估计

JMLR Workshop Conf Proc. 2015 Jul;37:1843-1851.

引用本文的文献

DA-IRRK: Data-Adaptive Iteratively Reweighted Robust Kernel-Based Approach for Back-End Optimization in Visual SLAM.DA-IRRK：用于视觉同步定位与地图构建后端优化的数据自适应迭代重加权鲁棒核方法

Sensors (Basel). 2025 Apr 17;25(8):2529. doi: 10.3390/s25082529.

Predictive assessment of eating disorder risk and recovery: Uncovering the effectiveness of questionnaires and influencing characteristics.饮食失调风险与康复的预测性评估：揭示问卷的有效性及影响因素

Comput Struct Biotechnol J. 2025 Apr 2;28:118-127. doi: 10.1016/j.csbj.2025.03.048. eCollection 2025.

Robust convex biclustering with a tuning-free method.一种无需调优方法的稳健凸双聚类

J Appl Stat. 2024 Jun 17;52(2):271-286. doi: 10.1080/02664763.2024.2367143. eCollection 2025.

Are Latent Factor Regression and Sparse Regression Adequate?潜在因子回归和稀疏回归是否足够？

J Am Stat Assoc. 2024;119(546):1076-1088. doi: 10.1080/01621459.2023.2169700. Epub 2023 Feb 14.

Robust analyzes for longitudinal clinical trials with missing and non-normal continuous outcomes.针对具有缺失和非正态连续结局的纵向临床试验的稳健分析。

Stat Theory Relat Fields. 2024;8(1):1-14. doi: 10.1080/24754269.2023.2261351. Epub 2023 Sep 26.

Investigation of Data Size Variability in Wind Speed Prediction Using AI Algorithms.使用人工智能算法进行风速预测时数据大小变异性的研究。

Cybern Syst. 2021;52(1):105-126. doi: 10.1080/01969722.2020.1827796. Epub 2020 Oct 6.

Sparse Reduced Rank Huber Regression in High Dimensions.高维稀疏降秩Huber回归

J Am Stat Assoc. 2023;118(544):2383-2393. doi: 10.1080/01621459.2022.2050243. Epub 2022 Apr 15.

Adaptive Huber Regression on Markov-dependent Data.基于马尔可夫相关数据的自适应Huber回归

Stoch Process Their Appl. 2022 Aug;150:802-818. doi: 10.1016/j.spa.2019.09.004. Epub 2019 Sep 25.

A SHRINKAGE PRINCIPLE FOR HEAVY-TAILED DATA: HIGH-DIMENSIONAL ROBUST LOW-RANK MATRIX RECOVERY.重尾数据的收缩原理：高维稳健低秩矩阵恢复

Ann Stat. 2021 Jun;49(3):1239-1266. doi: 10.1214/20-aos1980. Epub 2021 Aug 9.

Meta-Analyzing Multiple Omics Data With Robust Variable Selection.通过稳健变量选择对多组学数据进行Meta分析

Front Genet. 2021 Jul 5;12:656826. doi: 10.3389/fgene.2021.656826. eCollection 2021.

本文引用的文献

A SHRINKAGE PRINCIPLE FOR HEAVY-TAILED DATA: HIGH-DIMENSIONAL ROBUST LOW-RANK MATRIX RECOVERY.重尾数据的收缩原理：高维稳健低秩矩阵恢复

Ann Stat. 2021 Jun;49(3):1239-1266. doi: 10.1214/20-aos1980. Epub 2021 Aug 9.

I-LAMM FOR SPARSE LEARNING: SIMULTANEOUS CONTROL OF ALGORITHMIC COMPLEXITY AND STATISTICAL ERROR.用于稀疏学习的I-LAMM：算法复杂度与统计误差的同时控制

Ann Stat. 2018 Apr;46(2):814-841. doi: 10.1214/17-AOS1568. Epub 2018 Apr 3.

Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions.在不存在对称性和轻尾假设的情况下对高维均值回归进行估计。

J R Stat Soc Series B Stat Methodol. 2017 Jan;79(1):247-265. doi: 10.1111/rssb.12166. Epub 2016 Apr 14.

Silencing of ANXA3 expression by RNA interference inhibits the proliferation and invasion of breast cancer cells.通过RNA干扰沉默膜联蛋白A3（ANXA3）的表达可抑制乳腺癌细胞的增殖和侵袭。

Oncol Rep. 2017 Jan;37(1):388-398. doi: 10.3892/or.2016.5251. Epub 2016 Nov 16.

Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates.聚类失效：为何功能磁共振成像在空间范围推断上存在过高的假阳性率。

Proc Natl Acad Sci U S A. 2016 Jul 12;113(28):7900-5. doi: 10.1073/pnas.1602413113. Epub 2016 Jun 28.

A High-Dimensional Nonparametric Multivariate Test for Mean Vector.均值向量的高维非参数多元检验

J Am Stat Assoc. 2015;110(512):1658-1669. doi: 10.1080/01621459.2014.988215. Epub 2016 Jan 15.

SLOPE-ADAPTIVE VARIABLE SELECTION VIA CONVEX OPTIMIZATION.通过凸优化实现斜率自适应变量选择

Ann Appl Stat. 2015;9(3):1103-1140. doi: 10.1214/15-AOAS842.

GLOBALLY ADAPTIVE QUANTILE REGRESSION WITH ULTRA-HIGH DIMENSIONAL DATA.具有超高维数据的全局自适应分位数回归

Ann Stat. 2015 Oct 1;43(5):2225-2258. doi: 10.1214/15-AOS1340.

ADAPTIVE ROBUST VARIABLE SELECTION.自适应鲁棒变量选择

Ann Stat. 2014 Feb 1;42(1):324-351. doi: 10.1214/13-AOS1191.

Genome-wide shRNA screening identifies host factors involved in early endocytic events for HIV-1-induced CD4 down-regulation.全基因组shRNA筛选鉴定出参与HIV-1诱导的CD4下调早期内吞事件的宿主因子。

Retrovirology. 2014 Dec 13;11:118. doi: 10.1186/s12977-014-0118-4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

自适应稳健回归

Adaptive Huber Regression.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献