• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于置信分布的广义线性模型中的分布式同步推断

Distributed Simultaneous Inference in Generalized Linear Models via Confidence Distribution.

作者信息

Tang Lu, Zhou Ling, Song Peter X-K

机构信息

Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261, USA.

Center of Statistical Research, Southwestern University of Finance and Economics, Chengdu, Sichuan, China.

出版信息

J Multivar Anal. 2020 Mar;176. doi: 10.1016/j.jmva.2019.104567. Epub 2019 Nov 28.

DOI:10.1016/j.jmva.2019.104567
PMID:32863459
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7453826/
Abstract

We propose a distributed method for simultaneous inference for datasets with sample size much larger than the number of covariates, i.e., ≫ , in the generalized linear models framework. When such datasets are too big to be analyzed entirely by a single centralized computer, or when datasets are already stored in distributed database systems, the strategy of divide-and-combine has been the method of choice for scalability. Due to partition, the sub-dataset sample sizes may be uneven and some possibly close to , which calls for regularization techniques to improve numerical stability. However, there is a lack of clear theoretical justification and practical guidelines to combine results obtained from separate regularized estimators, especially when the final objective is simultaneous inference for a group of regression parameters. In this paper, we develop a strategy to combine bias-corrected lasso-type estimates by using confidence distributions. We show that the resulting combined estimator achieves the same estimation efficiency as that of the maximum likelihood estimator using the centralized data. As demonstrated by simulated and real data examples, our divide-and-combine method yields nearly identical inference as the centralized benchmark.

摘要

我们提出了一种分布式方法,用于在广义线性模型框架下对样本量远大于协变量数量(即(n\gg p))的数据集进行同时推断。当此类数据集太大而无法由单个集中式计算机完全分析时,或者当数据集已经存储在分布式数据库系统中时,分而治之的策略一直是实现可扩展性的首选方法。由于分区,子数据集的样本量可能不均匀,有些可能接近(p),这就需要正则化技术来提高数值稳定性。然而,缺乏明确的理论依据和实用指南来合并从单独的正则化估计器获得的结果,尤其是当最终目标是对一组回归参数进行同时推断时。在本文中,我们开发了一种通过使用置信分布来合并偏差校正后的套索型估计的策略。我们表明,由此产生的组合估计器与使用集中式数据的最大似然估计器具有相同的估计效率。如模拟和实际数据示例所示,我们的分而治之方法产生的推断与集中式基准几乎相同。

相似文献

1
Distributed Simultaneous Inference in Generalized Linear Models via Confidence Distribution.基于置信分布的广义线性模型中的分布式同步推断
J Multivar Anal. 2020 Mar;176. doi: 10.1016/j.jmva.2019.104567. Epub 2019 Nov 28.
2
Fixed and random effect selections in generalized linear mixed models.广义线性混合模型中的固定效应和随机效应选择
Stat Methods Med Res. 2024 Jan;33(1):3-23. doi: 10.1177/09622802231221201. Epub 2023 Dec 28.
3
Multivariate survival analysis in big data: A divide-and-combine approach.大数据中的多变量生存分析:一种分而治之的方法。
Biometrics. 2022 Sep;78(3):852-866. doi: 10.1111/biom.13469. Epub 2021 Apr 21.
4
A readily available improvement over method of moments for intra-cluster correlation estimation in the context of cluster randomized trials and fitting a GEE-type marginal model for binary outcomes.在群组随机试验和拟合二项结局的 GEE 型边缘模型的背景下,一种现成的改进方法,可以用于估计群组内相关性。
Clin Trials. 2019 Feb;16(1):41-51. doi: 10.1177/1740774518803635. Epub 2018 Oct 8.
5
Estimation and Inference in Generalized Additive Coefficient Models for Nonlinear Interactions with High-Dimensional Covariates.具有高维协变量的非线性相互作用的广义加性系数模型中的估计与推断。
Ann Stat. 2015 Oct;43(5):2102-2131. doi: 10.1214/15-AOS1344.
6
Online inference in high-dimensional generalized linear models with streaming data.具有流数据的高维广义线性模型中的在线推理
Electron J Stat. 2023;17(2):3443-3471. doi: 10.1214/23-ejs2182. Epub 2023 Nov 28.
7
Debiased lasso for generalized linear models with a diverging number of covariates.带有发散协变量数量的广义线性模型的去偏套索。
Biometrics. 2023 Mar;79(1):344-357. doi: 10.1111/biom.13587. Epub 2021 Nov 15.
8
Beyond prediction: A framework for inference with variational approximations in mixture models.超越预测:混合模型中使用变分近似进行推断的框架。
J Comput Graph Stat. 2019;28(4):778-789. doi: 10.1080/10618600.2019.1609977. Epub 2019 Jun 26.
9
A Distributed and Integrated Method of Moments for High-Dimensional Correlated Data Analysis.一种用于高维相关数据分析的分布式集成矩量法。
J Am Stat Assoc. 2021;116(534):805-818. doi: 10.1080/01621459.2020.1736082. Epub 2020 Apr 2.
10
PROFILE-KERNEL LIKELIHOOD INFERENCE WITH DIVERGING NUMBER OF PARAMETERS.参数数量不断变化时的轮廓核似然推断。
Ann Stat. 2008 Oct;36(5):2232-2260. doi: 10.1214/07-AOS544.

引用本文的文献

1
DrFARM: identification of pleiotropic genetic variants in genome-wide association studies.DrFARM:全基因组关联研究中多效性基因变异的识别
Nat Commun. 2025 Jul 1;16(1):5789. doi: 10.1038/s41467-025-60439-4.
2
A Tree-based Model Averaging Approach for Personalized Treatment Effect Estimation from Heterogeneous Data Sources.一种基于树的模型平均方法,用于从异构数据源估计个性化治疗效果。
Proc Mach Learn Res. 2022 Jul;162:21013-21036.
3
CEDAR: communication efficient distributed analysis for regressions.CEDAR:用于回归分析的通信高效分布式分析。

本文引用的文献

1
A fast divide-and-conquer sparse Cox regression.快速分治稀疏 Cox 回归。
Biostatistics. 2021 Apr 10;22(2):381-401. doi: 10.1093/biostatistics/kxz036.
2
DISTRIBUTED TESTING AND ESTIMATION UNDER SPARSE HIGH DIMENSIONAL MODELS.稀疏高维模型下的分布式测试与估计
Ann Stat. 2018 Jun;46(3):1352-1382. doi: 10.1214/17-AOS1587. Epub 2018 May 3.
3
Fused Lasso Approach in Regression Coefficients Clustering - Learning Parameter Heterogeneity in Data Integration.回归系数聚类中的融合套索方法——数据整合中的学习参数异质性
Biometrics. 2023 Sep;79(3):2357-2369. doi: 10.1111/biom.13786. Epub 2022 Nov 7.
J Mach Learn Res. 2016;17.
4
Fused lasso with the adaptation of parameter ordering in combining multiple studies with repeated measurements.在合并具有重复测量的多项研究中采用参数排序适配的融合套索法。
Biometrics. 2016 Dec;72(4):1184-1193. doi: 10.1111/biom.12496. Epub 2016 Feb 22.
5
Merging multiple longitudinal studies with study-specific missing covariates: A joint estimating function approach.
Biometrics. 2015 Dec;71(4):929-40. doi: 10.1111/biom.12356. Epub 2015 Jul 20.
6
Multivariate Meta-Analysis of Heterogeneous Studies Using Only Summary Statistics: Efficiency and Robustness.仅使用汇总统计量对异质性研究进行多变量Meta分析:效率与稳健性
J Am Stat Assoc. 2015;110(509):326-340. doi: 10.1080/01621459.2014.899235.
7
Scalable estimation strategies based on stochastic approximations: Classical results and new insights.基于随机近似的可扩展估计策略:经典结果与新见解。
Stat Comput. 2015 Jul 1;25(4):781-795. doi: 10.1007/s11222-015-9560-y.
8
Challenges of Big Data Analysis.大数据分析的挑战
Natl Sci Rev. 2014 Jun;1(2):293-314. doi: 10.1093/nsr/nwt032.
9
On the relative efficiency of using summary statistics versus individual-level data in meta-analysis.关于在荟萃分析中使用汇总统计量与个体水平数据的相对效率。
Biometrika. 2010 Jun;97(2):321-332. doi: 10.1093/biomet/asq006. Epub 2010 Apr 15.
10
Regularization Paths for Generalized Linear Models via Coordinate Descent.基于坐标下降法的广义线性模型正则化路径
J Stat Softw. 2010;33(1):1-22.