Suppr超能文献

一种用于正则化回归估计推断的摄动方法。

A Perturbation Method for Inference on Regularized Regression Estimates.

作者信息

Minnier Jessica, Tian Lu, Cai Tianxi

机构信息

Ph.D. candidate, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115.

出版信息

J Am Stat Assoc. 2011 Jan 1;106(496):1371-1382. doi: 10.1198/jasa.2011.tm10382. Epub 2012 Jan 24.

Abstract

Analysis of high dimensional data often seeks to identify a subset of important features and assess their effects on the outcome. Traditional statistical inference procedures based on standard regression methods often fail in the presence of high-dimensional features. In recent years, regularization methods have emerged as promising tools for analyzing high dimensional data. These methods simultaneously select important features and provide stable estimation of their effects. Adaptive LASSO and SCAD for instance, give consistent and asymptotically normal estimates with oracle properties. However, in finite samples, it remains difficult to obtain interval estimators for the regression parameters. In this paper, we propose perturbation resampling based procedures to approximate the distribution of a general class of penalized parameter estimates. Our proposal, justified by asymptotic theory, provides a simple way to estimate the covariance matrix and confidence regions. Through finite sample simulations, we verify the ability of this method to give accurate inference and compare it to other widely used standard deviation and confidence interval estimates. We also illustrate our proposals with a data set used to study the association of HIV drug resistance and a large number of genetic mutations.

摘要

对高维数据的分析通常旨在识别重要特征的子集,并评估它们对结果的影响。基于标准回归方法的传统统计推断程序在存在高维特征时往往会失效。近年来,正则化方法已成为分析高维数据的有前途的工具。这些方法同时选择重要特征并对其影响提供稳定的估计。例如,自适应LASSO和SCAD给出了具有神谕性质的一致且渐近正态的估计。然而,在有限样本中,仍然难以获得回归参数的区间估计。在本文中,我们提出基于扰动重采样的程序来近似一类一般惩罚参数估计的分布。我们的提议由渐近理论证明是合理的,它提供了一种估计协方差矩阵和置信区域的简单方法。通过有限样本模拟,我们验证了该方法进行准确推断的能力,并将其与其他广泛使用的标准差和置信区间估计进行比较。我们还用一个用于研究HIV耐药性与大量基因突变关联的数据集来说明我们的提议。

相似文献

1
A Perturbation Method for Inference on Regularized Regression Estimates.
J Am Stat Assoc. 2011 Jan 1;106(496):1371-1382. doi: 10.1198/jasa.2011.tm10382. Epub 2012 Jan 24.
2
A Simple Method for Deriving the Confidence Regions for the Penalized Cox's Model via the Minimand Perturbation.
Commun Stat Theory Methods. 2017;46(10):4791-4808. doi: 10.1080/03610926.2015.1085568. Epub 2016 May 18.
3
Inference for survival prediction under the regularized Cox model.
Biostatistics. 2016 Oct;17(4):692-707. doi: 10.1093/biostatistics/kxw016. Epub 2016 Apr 22.
5
Regularized robust estimation in binary regression models.
J Appl Stat. 2020 Sep 18;49(3):574-598. doi: 10.1080/02664763.2020.1822304. eCollection 2022.
6
On the impact of model selection on predictor identification and parameter inference.
Comput Stat. 2017;32(2):667-690. doi: 10.1007/s00180-016-0690-2. Epub 2016 Oct 22.
7
CALIBRATING NON-CONVEX PENALIZED REGRESSION IN ULTRA-HIGH DIMENSION.
Ann Stat. 2013 Oct 1;41(5):2505-2536. doi: 10.1214/13-AOS1159.
8
VARIABLE SELECTION IN LINEAR MIXED EFFECTS MODELS.
Ann Stat. 2012 Aug 1;40(4):2043-2068. doi: 10.1214/12-AOS1028.
9
Universal sieve-based strategies for efficient estimation using machine learning tools.
Bernoulli (Andover). 2021 Nov;27(4):2300-2336. doi: 10.3150/20-BEJ1309. Epub 2021 Aug 24.
10

引用本文的文献

1
Causal mediation analysis: selection with asymptotically valid inference.
J R Stat Soc Series B Stat Methodol. 2024 Nov 28;87(3):678-700. doi: 10.1093/jrsssb/qkae109. eCollection 2025 Jul.
2
Constructing an early warning model for elderly sepsis patients based on machine learning.
Sci Rep. 2025 Mar 27;15(1):10580. doi: 10.1038/s41598-025-95604-8.
3
Variable selection in modelling clustered data via within-cluster resampling.
Can J Stat. 2025 Mar;53(1). doi: 10.1002/cjs.11824. Epub 2024 Aug 1.
4
Semisupervised transfer learning for evaluation of model classification performance.
Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujae002.
6
Semi-supervised ROC analysis for reliable and streamlined evaluation of phenotyping algorithms.
J Am Med Inform Assoc. 2024 Feb 16;31(3):640-650. doi: 10.1093/jamia/ocad226.
7
ATLAS: an automated association test using probabilistically linked health records with application to genetic studies.
J Am Med Inform Assoc. 2021 Nov 25;28(12):2582-2592. doi: 10.1093/jamia/ocab187.

本文引用的文献

1
ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS.
Ann Stat. 2009;37(4):1733-1751. doi: 10.1214/08-AOS625.
2
One-step Sparse Estimates in Nonconcave Penalized Likelihood Models.
Ann Stat. 2008 Aug 1;36(4):1509-1533. doi: 10.1214/009053607000000802.
4
Colorectal cancer risk prediction tool for white men and women without known susceptibility.
J Clin Oncol. 2009 Feb 10;27(5):686-93. doi: 10.1200/JCO.2008.17.4797. Epub 2008 Dec 29.
5
Regularized estimation for the accelerated failure time model.
Biometrics. 2009 Jun;65(2):394-404. doi: 10.1111/j.1541-0420.2008.01074.x.
7
The LLP risk model: an individual risk prediction model for lung cancer.
Br J Cancer. 2008 Jan 29;98(2):270-6. doi: 10.1038/sj.bjc.6604158. Epub 2007 Dec 18.
8
Triple-negative breast cancer: clinical features and patterns of recurrence.
Clin Cancer Res. 2007 Aug 1;13(15 Pt 1):4429-34. doi: 10.1158/1078-0432.CCR-06-3045.
9
Assessing prostate cancer risk: results from the Prostate Cancer Prevention Trial.
J Natl Cancer Inst. 2006 Apr 19;98(8):529-34. doi: 10.1093/jnci/djj131.
10

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验