用于高维广义线性模型估计和推断的样本拆分后去偏套索法。

Debiased lasso after sample splitting for estimation and inference in high-dimensional generalized linear models.

作者信息

Vazquez Omar, Nan Bin

机构信息

Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, U.S.A.

Department of Statistics, University of California, Irvine, California, U.S.A.

出版信息

Can J Stat. 2025 Mar;53(1). doi: 10.1002/cjs.11827. Epub 2024 Aug 21.

DOI:10.1002/cjs.11827

PMID:40462868

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12128611/

Abstract

We consider random sample splitting for estimation and inference in high dimensional generalized linear models, where we first apply the lasso to select a submodel using one subsample and then apply the debiased lasso to fit the selected model using the remaining subsample. We show that a sample splitting procedure based on the debiased lasso yields asymptotically normal estimates under mild conditions and that multiple splitting can address the loss of efficiency. Our simulation results indicate that using the debiased lasso instead of the standard maximum likelihood method in the estimation stage can vastly reduce the bias and variance of the resulting estimates. Furthermore, our multiple splitting debiased lasso method has better numerical performance than some existing methods for high dimensional generalized linear models proposed in the recent literature. We illustrate the proposed multiple splitting method with an analysis of the smoking data of the Mid-South Tobacco Case-Control Study.

摘要

我们考虑在高维广义线性模型中进行随机样本拆分以进行估计和推断，其中我们首先应用套索回归使用一个子样本选择一个子模型，然后应用去偏套索回归使用剩余子样本拟合所选模型。我们表明，基于去偏套索回归的样本拆分程序在温和条件下产生渐近正态估计，并且多次拆分可以解决效率损失问题。我们的模拟结果表明，在估计阶段使用去偏套索回归而不是标准最大似然方法可以大大降低所得估计的偏差和方差。此外，我们的多次拆分去偏套索回归方法在数值性能上优于近期文献中提出的一些用于高维广义线性模型的现有方法。我们通过对中南烟草病例对照研究的吸烟数据进行分析来说明所提出的多次拆分方法。

相似文献

Debiased lasso after sample splitting for estimation and inference in high-dimensional generalized linear models.

Can J Stat. 2025 Mar;53(1). doi: 10.1002/cjs.11827. Epub 2024 Aug 21.

Online inference in high-dimensional generalized linear models with streaming data.

Electron J Stat. 2023;17(2):3443-3471. doi: 10.1214/23-ejs2182. Epub 2023 Nov 28.

Debiased lasso for generalized linear models with a diverging number of covariates.

Biometrics. 2023 Mar;79(1):344-357. doi: 10.1111/biom.13587. Epub 2021 Nov 15.

DOUBLY DEBIASED LASSO: HIGH-DIMENSIONAL INFERENCE UNDER HIDDEN CONFOUNDING.

Ann Stat. 2022 Jun;50(3):1320-1347. doi: 10.1214/21-aos2152. Epub 2022 Jun 16.

Sparse Group Lasso: Optimal Sample Complexity, Convergence Rate, and Statistical Inference.

IEEE Trans Inf Theory. 2022 Sep;68(9):5975-6002. doi: 10.1109/tit.2022.3175455. Epub 2022 May 16.

Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study.

BMC Med Res Methodol. 2022 Jul 26;22(1):206. doi: 10.1186/s12874-022-01681-y.

Debiased inference for heterogeneous subpopulations in a high-dimensional logistic regression model.

Sci Rep. 2023 Dec 11;13(1):21979. doi: 10.1038/s41598-023-48903-x.

Hierarchical False Discovery Rate Control for High-dimensional Survival Analysis with Interactions.

Comput Stat Data Anal. 2024 Apr;192. doi: 10.1016/j.csda.2023.107906. Epub 2023 Dec 5.

Estimation and Inference for High-Dimensional Generalized Linear Models with Knowledge Transfer.

J Am Stat Assoc. 2024;119(546):1274-1285. doi: 10.1080/01621459.2023.2184373. Epub 2023 Apr 12.

Inference for high-dimensional linear mixed-effects models: A quasi-likelihood approach.

J Am Stat Assoc. 2022;117(540):1835-1846. doi: 10.1080/01621459.2021.1888740. Epub 2021 Apr 20.

本文引用的文献

Statistical Inference for High-Dimensional Generalized Linear Models with Binary Outcomes.

J Am Stat Assoc. 2023;118(542):1319-1332. doi: 10.1080/01621459.2021.1990769. Epub 2021 Dec 9.

Elastic Net Regularization Paths for All Generalized Linear Models.

J Stat Softw. 2023;106. doi: 10.18637/jss.v106.i01. Epub 2023 Mar 23.

Identification of a Novel Functional Non-synonymous Single Nucleotide Polymorphism in Frizzled Class Receptor 6 Gene for Involvement in Depressive Symptoms.

Front Mol Neurosci. 2022 Jul 7;15:882396. doi: 10.3389/fnmol.2022.882396. eCollection 2022.

Debiased lasso for generalized linear models with a diverging number of covariates.

Biometrics. 2023 Mar;79(1):344-357. doi: 10.1111/biom.13587. Epub 2021 Nov 15.

Estimation and Inference for High Dimensional Generalized Linear Models: A Splitting and Smoothing Approach.

J Mach Learn Res. 2021;22.

Global and Simultaneous Hypothesis Testing for High-Dimensional Logistic Regression Models.

J Am Stat Assoc. 2021;116(534):984-998. doi: 10.1080/01621459.2019.1699421. Epub 2020 Jan 21.

Prediction of Smoking Behavior From Single Nucleotide Polymorphisms With Machine Learning Approaches.

Front Psychiatry. 2020 May 14;11:416. doi: 10.3389/fpsyt.2020.00416. eCollection 2020.

A modern maximum-likelihood theory for high-dimensional logistic regression.

Proc Natl Acad Sci U S A. 2019 Jul 16;116(29):14516-14525. doi: 10.1073/pnas.1810420116. Epub 2019 Jul 1.

Drawing inferences for high-dimensional linear models: A selection-assisted partial regression and smoothing approach.

Biometrics. 2019 Jun;75(2):551-561. doi: 10.1111/biom.13013. Epub 2019 Mar 29.

Genome-wide association study in Finnish twins highlights the connection between nicotine addiction and neurotrophin signaling pathway.

Addict Biol. 2019 May;24(3):549-561. doi: 10.1111/adb.12618. Epub 2018 Mar 13.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于高维广义线性模型估计和推断的样本拆分后去偏套索法。

Debiased lasso after sample splitting for estimation and inference in high-dimensional generalized linear models.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献