Suppr超能文献

分布式统计分析:一项范围综述及适用于健康分析的操作框架示例

Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Health Analytics.

作者信息

Camirand Lemyre Félix, Lévesque Simon, Domingue Marie-Pier, Herrmann Klaus, Ethier Jean-François

机构信息

GRIIS, Université de Sherbrooke, 2500, Boul de l'Université, Sherbrooke, QC, J1K 2R1, Canada, 1-819-821-8000 ext 74977.

Département de mathématiques, Faculté des sciences, Université de Sherbrooke, Sherbrooke, QC, Canada.

出版信息

JMIR Med Inform. 2024 Nov 14;12:e53622. doi: 10.2196/53622.

Abstract

BACKGROUND

Data from multiple organizations are crucial for advancing learning health systems. However, ethical, legal, and social concerns may restrict the use of standard statistical methods that rely on pooling data. Although distributed algorithms offer alternatives, they may not always be suitable for health frameworks.

OBJECTIVE

This study aims to support researchers and data custodians in three ways: (1) providing a concise overview of the literature on statistical inference methods for horizontally partitioned data, (2) describing the methods applicable to generalized linear models (GLMs) and assessing their underlying distributional assumptions, and (3) adapting existing methods to make them fully usable in health settings.

METHODS

A scoping review methodology was used for the literature mapping, from which methods presenting a methodological framework for GLM analyses with horizontally partitioned data were identified and assessed from the perspective of applicability in health settings. Statistical theory was used to adapt methods and derive the properties of the resulting estimators.

RESULTS

From the review, 41 articles were selected and 6 approaches were extracted to conduct standard GLM-based statistical analysis. However, these approaches assumed evenly and identically distributed data across nodes. Consequently, statistical procedures were derived to accommodate uneven node sample sizes and heterogeneous data distributions across nodes. Workflows and detailed algorithms were developed to highlight information sharing requirements and operational complexity.

CONCLUSIONS

This study contributes to the field of health analytics by providing an overview of the methods that can be used with horizontally partitioned data by adapting these methods to the context of heterogeneous health data and clarifying the workflows and quantities exchanged by the methods discussed. Further analysis of the confidentiality preserved by these methods is needed to fully understand the risk associated with the sharing of summary statistics.

摘要

背景

多个组织的数据对于推进学习型健康系统至关重要。然而,伦理、法律和社会问题可能会限制依赖数据汇总的标准统计方法的使用。尽管分布式算法提供了替代方案,但它们可能并不总是适用于健康框架。

目的

本研究旨在通过三种方式支持研究人员和数据保管人:(1)简要概述有关水平分区数据统计推断方法的文献;(2)描述适用于广义线性模型(GLM)的方法并评估其潜在的分布假设;(3)调整现有方法使其在健康环境中完全可用。

方法

采用范围综述方法进行文献梳理,从其中识别出为水平分区数据的GLM分析提供方法框架的方法,并从在健康环境中的适用性角度进行评估。使用统计理论来调整方法并推导所得估计量的性质。

结果

通过综述,选择了41篇文章并提取了6种方法来进行基于标准GLM的统计分析。然而,这些方法假设节点间数据均匀且同分布。因此,推导了统计程序以适应节点样本大小不均和节点间数据分布异质性的情况。开发了工作流程和详细算法以突出信息共享要求和操作复杂性。

结论

本研究通过概述可用于水平分区数据的方法,将这些方法应用于异构健康数据的背景下,并阐明所讨论方法的工作流程和交换的数量,为健康分析领域做出了贡献。需要进一步分析这些方法所保留的保密性,以充分理解与汇总统计数据共享相关的风险。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1527/11617597/33347d142aa1/medinform-v12-e53622-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验