Suppr超能文献

分布式准泊松回归算法在分布式数据网络中对多点计数结果进行建模。

Distributed Quasi-Poisson regression algorithm for modeling multi-site count outcomes in distributed data networks.

机构信息

Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA.

Optum Labs at UnitedHealth Group, Minnetonka, MN, USA.

出版信息

J Biomed Inform. 2022 Jul;131:104097. doi: 10.1016/j.jbi.2022.104097. Epub 2022 May 25.

Abstract

BACKGROUND

Observational studies incorporating real-world data from multiple institutions facilitate study of rare outcomes or exposures and improve generalizability of results. Due to privacy concerns surrounding patient-level data sharing across institutions, methods for performing regression analyses distributively are desirable. Meta-analysis of institution-specific estimates is commonly used, but has been shown to produce biased estimates in certain settings. While distributed regression methods are increasingly available, methods for analyzing count outcomes are currently limited. Count data in practice are commonly subject to overdispersion, exhibiting greater variability than expected under a given statistical model.

OBJECTIVE

We propose a novel computational method, a one-shot distributed algorithm for quasi-Poisson regression (ODAP), to distributively model count outcomes while accounting for overdispersion.

METHODS

ODAP incorporates a surrogate likelihood approach to perform distributed quasi-Poisson regression without requiring patient-level data sharing, only requiring sharing of aggregate data from each participating institution. ODAP requires at most three rounds of non-iterative communication among institutions to generate coefficient estimates and corresponding standard errors. In simulations, we evaluate ODAP under several data scenarios possible in multi-site analyses, comparing ODAP and meta-analysis estimates in terms of error relative to pooled regression estimates, considered the gold standard. In a proof-of-concept real-world data analysis, we similarly compare ODAP and meta-analysis in terms of relative error to pooled estimatation using data from the OneFlorida Clinical Research Consortium, modeling length of stay in COVID-19 patients as a function of various patient characteristics. In a second proof-of-concept analysis, using the same outcome and covariates, we incorporate data from the UnitedHealth Group Clinical Discovery Database together with the OneFlorida data in a distributed analysis to compare estimates produced by ODAP and meta-analysis.

RESULTS

In simulations, ODAP exhibited negligible error relative to pooled regression estimates across all settings explored. Meta-analysis estimates, while largely unbiased, were increasingly variable as heterogeneity in the outcome increased across institutions. When baseline expected count was 0.2, relative error for meta-analysis was above 5% in 25% of iterations (250/1000), while the largest relative error for ODAP in any iteration was 3.59%. In our proof-of-concept analysis using only OneFlorida data, ODAP estimates were closer to pooled regression estimates than those produced by meta-analysis for all 15 covariates. In our distributed analysis incorporating data from both OneFlorida and the UnitedHealth Group Clinical Discovery Database, ODAP and meta-analysis estimates were largely similar, while some differences in estimates (as large as 13.8%) could be indicative of bias in meta-analytic estimates.

CONCLUSIONS

ODAP performs privacy-preserving, communication-efficient distributed quasi-Poisson regression to analyze count outcomes using data stored within multiple institutions. Our method produces estimates nearly matching pooled regression estimates and sometimes more accurate than meta-analysis estimates, most notably in settings with relatively low counts and high outcome heterogeneity across institutions.

摘要

背景

从多个机构收集真实世界数据的观察性研究有助于研究罕见结局或暴露,并提高结果的普遍性。由于机构间患者层面数据共享存在隐私问题,因此需要分布式回归分析方法。机构特异性估计值的荟萃分析通常是有用的,但在某些情况下会产生有偏估计值。尽管分布式回归方法越来越多,但分析计数结果的方法目前仍然有限。在实践中,计数数据通常受到过离散的影响,表现出比给定统计模型预期更大的变异性。

目的

我们提出了一种新的计算方法,即用于分布准泊松回归的单次分布式算法(ODAP),以在考虑过离散的情况下对计数结果进行分布式建模。

方法

ODAP 采用替代似然方法进行分布式准泊松回归,无需共享患者水平数据,只需共享每个参与机构的汇总数据。ODAP 最多需要三个机构之间的非迭代通信轮次来生成系数估计值和相应的标准误差。在模拟中,我们根据多站点分析中可能出现的几种数据情况评估 ODAP,根据与汇总回归估计值的误差来比较 ODAP 和荟萃分析估计值,后者被认为是金标准。在一个真实世界数据的概念验证分析中,我们同样根据相对误差来比较 ODAP 和荟萃分析,以使用 OneFlorida 临床研究联盟的数据来模拟 COVID-19 患者的住院时间作为各种患者特征的函数。在第二个概念验证分析中,使用相同的结局和协变量,我们将 OneFlorida 数据与 UnitedHealth Group 临床发现数据库的数据一起纳入分布式分析中,以比较 ODAP 和荟萃分析产生的估计值。

结果

在模拟中,ODAP 与所有探索的设置中的汇总回归估计值相比,误差可忽略不计。荟萃分析估计值虽然基本无偏,但随着结局在机构间的异质性增加,变异性越来越大。当基线预期计数为 0.2 时,在 25%的迭代(250/1000)中,荟萃分析的相对误差超过 5%,而 ODAP 在任何迭代中的最大相对误差均为 3.59%。在仅使用 OneFlorida 数据的概念验证分析中,对于所有 15 个协变量,ODAP 估计值比荟萃分析产生的估计值更接近汇总回归估计值。在我们将 OneFlorida 和 UnitedHealth Group 临床发现数据库的数据结合起来进行的分布式分析中,ODAP 和荟萃分析的估计值大致相似,而一些估计值的差异(高达 13.8%)可能表明荟萃分析估计值存在偏差。

结论

ODAP 使用存储在多个机构中的数据执行隐私保护、通信高效的分布式准泊松回归分析来分析计数结果。我们的方法产生的估计值几乎与汇总回归估计值相匹配,有时比荟萃分析估计值更准确,尤其是在计数相对较低且机构间结局异质性较高的情况下。

相似文献

引用本文的文献

5
Centralized and Federated Models for the Analysis of Clinical Data.集中式和联邦式临床数据分析模型。
Annu Rev Biomed Data Sci. 2024 Aug;7(1):179-199. doi: 10.1146/annurev-biodatasci-122220-115746. Epub 2024 Jul 24.

本文引用的文献

8
Stakeholders' views on data sharing in multicenter studies.利益相关者对多中心研究中数据共享的看法。
J Comp Eff Res. 2017 Sep;6(6):537-547. doi: 10.2217/cer-2017-0009. Epub 2017 Aug 14.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验