高维广义线性模型的最优泊松子采样去相关得分

Optimal Poisson subsampling decorrelated score for high-dimensional generalized linear models.

作者信息

Shan Junhao, Wang Lei

机构信息

School of Statistics and Data Science, KLMDASR, LEBPS and LPMC, Nankai University, Tianjin, People's Republic of China.

出版信息

J Appl Stat. 2024 Feb 11;51(14):2719-2743. doi: 10.1080/02664763.2024.2315467. eCollection 2024.

DOI:10.1080/02664763.2024.2315467

PMID:39440231

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11492415/

Abstract

For high-dimensional generalized linear models (GLMs) with massive data, this paper investigates a unified optimal Poisson subsampling scheme to conduct estimation and inference for prespecified low-dimensional partition of the whole parameter. A Poisson subsampling decorrelated score function is proposed such that the adverse effect of the less accurate nuisance parameter estimation with slow convergence rate can be mitigated. The resultant Poisson subsample estimator is proved to enjoy consistency and asymptotic normality, and a more general optimal subsampling criterion including A- and L-optimality criteria is formulated to improve estimation efficiency. We also propose a two-step algorithm for implementation and discuss some practical issues. The satisfactory performance of our method is validated through simulation studies and a real dataset.

摘要

对于具有海量数据的高维广义线性模型（GLMs），本文研究了一种统一的最优泊松子采样方案，用于对整个参数的预先指定的低维划分进行估计和推断。提出了一种泊松子采样去相关得分函数，以减轻收敛速度较慢的不太准确的干扰参数估计的不利影响。结果表明，所得的泊松子样本估计器具有一致性和渐近正态性，并制定了一个更通用的最优子采样准则，包括A-最优和L-最优准则，以提高估计效率。我们还提出了一种两步算法用于实现，并讨论了一些实际问题。通过模拟研究和一个真实数据集验证了我们方法的良好性能。

相似文献

Optimal Poisson subsampling decorrelated score for high-dimensional generalized linear models.高维广义线性模型的最优泊松子采样去相关得分

J Appl Stat. 2024 Feb 11;51(14):2719-2743. doi: 10.1080/02664763.2024.2315467. eCollection 2024.

Optimal Subsampling for Large Sample Logistic Regression.大样本逻辑回归的最优子采样

J Am Stat Assoc. 2018;113(522):829-844. doi: 10.1080/01621459.2017.1292914. Epub 2018 Jun 6.

Asymptotics of Subsampling for Generalized Linear Regression Models under Unbounded Design.无界设计下广义线性回归模型的子采样渐近性

Entropy (Basel). 2022 Dec 31;25(1):84. doi: 10.3390/e25010084.

Optimal subsampling for parametric accelerated failure time models with massive survival data.针对大规模生存数据的参数加速失效时间模型的最优抽样。

Stat Med. 2022 Nov 30;41(27):5421-5431. doi: 10.1002/sim.9576. Epub 2022 Sep 20.

Robust and efficient subsampling algorithms for massive data logistic regression.用于海量数据逻辑回归的稳健且高效的子采样算法。

J Appl Stat. 2023 Apr 26;51(8):1427-1445. doi: 10.1080/02664763.2023.2205611. eCollection 2024.

Subsampling based variable selection for generalized linear models.基于子采样的广义线性模型变量选择

Comput Stat Data Anal. 2023 Aug;184. doi: 10.1016/j.csda.2023.107740. Epub 2023 Mar 11.

: a fast subsampling algorithm for Cox model with distributed and massive survival data.

Int J Biostat. 2025 Feb 4;21(1):53-65. doi: 10.1515/ijb-2024-0042. eCollection 2025 May 1.

Sampling-based estimation for massive survival data with additive hazards model.基于抽样的加性风险模型在海量生存数据分析中的估计。

Stat Med. 2021 Jan 30;40(2):441-450. doi: 10.1002/sim.8783. Epub 2020 Nov 3.

On high-dimensional Poisson models with measurement error: Hypothesis testing for nonlinear nonconvex optimization.关于具有测量误差的高维泊松模型：非线性非凸优化的假设检验

Ann Stat. 2023 Feb;51(1):233-259. doi: 10.1214/22-aos2248. Epub 2023 Mar 23.

Communication-efficient estimation and inference for high-dimensional quantile regression based on smoothed decorrelated score.基于平滑去相关得分的高维分位数回归的通信高效估计与推断

Stat Med. 2022 Nov 10;41(25):5084-5101. doi: 10.1002/sim.9555. Epub 2022 Aug 13.

本文引用的文献

TEST OF SIGNIFICANCE FOR HIGH-DIMENSIONAL LONGITUDINAL DATA.高维纵向数据的显著性检验

Ann Stat. 2020 Oct;48(5):2622-2645. doi: 10.1214/19-aos1900. Epub 2020 Sep 19.

Optimal Subsampling for Large Sample Logistic Regression.大样本逻辑回归的最优子采样

J Am Stat Assoc. 2018;113(522):829-844. doi: 10.1080/01621459.2017.1292914. Epub 2018 Jun 6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验