Suppr超能文献

基于置信分布的广义线性模型中的分布式同步推断

Distributed Simultaneous Inference in Generalized Linear Models via Confidence Distribution.

作者信息

Tang Lu, Zhou Ling, Song Peter X-K

机构信息

Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261, USA.

Center of Statistical Research, Southwestern University of Finance and Economics, Chengdu, Sichuan, China.

出版信息

J Multivar Anal. 2020 Mar;176. doi: 10.1016/j.jmva.2019.104567. Epub 2019 Nov 28.

Abstract

We propose a distributed method for simultaneous inference for datasets with sample size much larger than the number of covariates, i.e., ≫ , in the generalized linear models framework. When such datasets are too big to be analyzed entirely by a single centralized computer, or when datasets are already stored in distributed database systems, the strategy of divide-and-combine has been the method of choice for scalability. Due to partition, the sub-dataset sample sizes may be uneven and some possibly close to , which calls for regularization techniques to improve numerical stability. However, there is a lack of clear theoretical justification and practical guidelines to combine results obtained from separate regularized estimators, especially when the final objective is simultaneous inference for a group of regression parameters. In this paper, we develop a strategy to combine bias-corrected lasso-type estimates by using confidence distributions. We show that the resulting combined estimator achieves the same estimation efficiency as that of the maximum likelihood estimator using the centralized data. As demonstrated by simulated and real data examples, our divide-and-combine method yields nearly identical inference as the centralized benchmark.

摘要

我们提出了一种分布式方法,用于在广义线性模型框架下对样本量远大于协变量数量(即(n\gg p))的数据集进行同时推断。当此类数据集太大而无法由单个集中式计算机完全分析时,或者当数据集已经存储在分布式数据库系统中时,分而治之的策略一直是实现可扩展性的首选方法。由于分区,子数据集的样本量可能不均匀,有些可能接近(p),这就需要正则化技术来提高数值稳定性。然而,缺乏明确的理论依据和实用指南来合并从单独的正则化估计器获得的结果,尤其是当最终目标是对一组回归参数进行同时推断时。在本文中,我们开发了一种通过使用置信分布来合并偏差校正后的套索型估计的策略。我们表明,由此产生的组合估计器与使用集中式数据的最大似然估计器具有相同的估计效率。如模拟和实际数据示例所示,我们的分而治之方法产生的推断与集中式基准几乎相同。

相似文献

2
Fixed and random effect selections in generalized linear mixed models.广义线性混合模型中的固定效应和随机效应选择
Stat Methods Med Res. 2024 Jan;33(1):3-23. doi: 10.1177/09622802231221201. Epub 2023 Dec 28.

本文引用的文献

1
A fast divide-and-conquer sparse Cox regression.快速分治稀疏 Cox 回归。
Biostatistics. 2021 Apr 10;22(2):381-401. doi: 10.1093/biostatistics/kxz036.
5
Merging multiple longitudinal studies with study-specific missing covariates: A joint estimating function approach.
Biometrics. 2015 Dec;71(4):929-40. doi: 10.1111/biom.12356. Epub 2015 Jul 20.
8
Challenges of Big Data Analysis.大数据分析的挑战
Natl Sci Rev. 2014 Jun;1(2):293-314. doi: 10.1093/nsr/nwt032.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验