• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CEDAR:用于回归分析的通信高效分布式分析。

CEDAR: communication efficient distributed analysis for regressions.

机构信息

Department of Biostatistics, Epidemiology, and Informatics Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA.

出版信息

Biometrics. 2023 Sep;79(3):2357-2369. doi: 10.1111/biom.13786. Epub 2022 Nov 7.

DOI:10.1111/biom.13786
PMID:36305019
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10133408/
Abstract

Electronic health records (EHRs) offer great promises for advancing precision medicine and, at the same time, present significant analytical challenges. Particularly, it is often the case that patient-level data in EHRs cannot be shared across institutions (data sources) due to government regulations and/or institutional policies. As a result, there are growing interests about distributed learning over multiple EHRs databases without sharing patient-level data. To tackle such challenges, we propose a novel communication efficient method that aggregates the optimal estimates of external sites, by turning the problem into a missing data problem. In addition, we propose incorporating posterior samples of remote sites, which can provide partial information on the missing quantities and improve efficiency of parameter estimates while having the differential privacy property and thus reducing the risk of information leaking. The proposed approach, without sharing the raw patient level data, allows for proper statistical inference. We provide theoretical investigation for the asymptotic properties of the proposed method for statistical inference as well as differential privacy, and evaluate its performance in simulations and real data analyses in comparison with several recently developed methods.

摘要

电子健康记录 (EHR) 为推进精准医学提供了巨大的承诺,但同时也带来了重大的分析挑战。特别是,由于政府法规和/或机构政策,EHR 中的患者级数据通常无法在机构间(数据源)共享。因此,人们越来越感兴趣地在不共享患者级数据的情况下,通过在多个 EHR 数据库上进行分布式学习来解决此类挑战。为了应对这些挑战,我们提出了一种新颖的通信高效方法,通过将问题转化为缺失数据问题,来聚合外部站点的最优估计。此外,我们还提出了合并远程站点的后验样本的方法,这些样本可以提供关于缺失量的部分信息,并在具有差分隐私属性的同时提高参数估计的效率,从而降低信息泄露的风险。所提出的方法无需共享原始患者级数据,即可进行适当的统计推断。我们对所提出方法的统计推断和差分隐私的渐近性质进行了理论研究,并将其性能与几种最近开发的方法在模拟和真实数据分析中进行了比较。

相似文献

1
CEDAR: communication efficient distributed analysis for regressions.CEDAR:用于回归分析的通信高效分布式分析。
Biometrics. 2023 Sep;79(3):2357-2369. doi: 10.1111/biom.13786. Epub 2022 Nov 7.
2
Multiple imputation for analysis of incomplete data in distributed health data networks.分布式健康数据网络中不完全数据的多重插补分析。
Nat Commun. 2020 Oct 29;11(1):5467. doi: 10.1038/s41467-020-19270-2.
3
Privacy-Preserving Methods for Vertically Partitioned Incomplete Data.垂直分区不完整数据的隐私保护方法。
AMIA Annu Symp Proc. 2021 Jan 25;2020:348-357. eCollection 2020.
4
Federated Multiple Imputation for Variables that Are Missing Not At Random in Distributed Electronic Health Records.分布式电子健康记录中针对非随机缺失变量的联邦多重填补
medRxiv. 2024 Sep 16:2024.09.15.24313479. doi: 10.1101/2024.09.15.24313479.
5
Systematic Review of Privacy-Preserving Distributed Machine Learning From Federated Databases in Health Care.医疗保健领域联合数据库中隐私保护分布式机器学习的系统综述
JCO Clin Cancer Inform. 2020 Mar;4:184-200. doi: 10.1200/CCI.19.00047.
6
Distributed Quasi-Poisson regression algorithm for modeling multi-site count outcomes in distributed data networks.分布式准泊松回归算法在分布式数据网络中对多点计数结果进行建模。
J Biomed Inform. 2022 Jul;131:104097. doi: 10.1016/j.jbi.2022.104097. Epub 2022 May 25.
7
Learning from electronic health records across multiple sites: A communication-efficient and privacy-preserving distributed algorithm.从多个站点的电子健康记录中学习:一种通信高效且隐私保护的分布式算法。
J Am Med Inform Assoc. 2020 Mar 1;27(3):376-385. doi: 10.1093/jamia/ocz199.
8
Healthchain: A novel framework on privacy preservation of electronic health records using blockchain technology.健康链:利用区块链技术保护电子健康记录隐私的新框架。
PLoS One. 2020 Dec 9;15(12):e0243043. doi: 10.1371/journal.pone.0243043. eCollection 2020.
9
Decentralised, collaborative, and privacy-preserving machine learning for multi-hospital data.去中心化、协作和保护隐私的机器学习,适用于多医院数据。
EBioMedicine. 2024 Mar;101:105006. doi: 10.1016/j.ebiom.2024.105006. Epub 2024 Feb 19.
10
An efficient and accurate distributed learning algorithm for modeling multi-site zero-inflated count outcomes.一种高效准确的分布式学习算法,用于对多站点零膨胀计数结果进行建模。
Sci Rep. 2021 Oct 4;11(1):19647. doi: 10.1038/s41598-021-99078-2.

引用本文的文献

1
Federated Multiple Imputation for Variables that Are Missing Not At Random in Distributed Electronic Health Records.分布式电子健康记录中针对非随机缺失变量的联合多重填补法
AMIA Annu Symp Proc. 2025 May 22;2024:703-712. eCollection 2024.
2
Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Health Analytics.分布式统计分析:一项范围综述及适用于健康分析的操作框架示例
JMIR Med Inform. 2024 Nov 14;12:e53622. doi: 10.2196/53622.
3
Recent methodological advances in federated learning for healthcare.

本文引用的文献

1
Communication-Efficient Accurate Statistical Estimation.通信高效的精确统计估计
J Am Stat Assoc. 2023;118(542):1000-1010. doi: 10.1080/01621459.2021.1969238. Epub 2021 Sep 24.
2
A Distributed and Integrated Method of Moments for High-Dimensional Correlated Data Analysis.一种用于高维相关数据分析的分布式集成矩量法。
J Am Stat Assoc. 2021;116(534):805-818. doi: 10.1080/01621459.2020.1736082. Epub 2020 Apr 2.
3
Distributed Simultaneous Inference in Generalized Linear Models via Confidence Distribution.基于置信分布的广义线性模型中的分布式同步推断
医疗保健领域联邦学习的最新方法进展。
Patterns (N Y). 2024 Jun 14;5(6):101006. doi: 10.1016/j.patter.2024.101006.
J Multivar Anal. 2020 Mar;176. doi: 10.1016/j.jmva.2019.104567. Epub 2019 Nov 28.
4
Distributed Differentially-Private Algorithms for Matrix and Tensor Factorization.用于矩阵和张量分解的分布式差分隐私算法
IEEE J Sel Top Signal Process. 2018 Dec;12(6):1449-1464. doi: 10.1109/JSTSP.2018.2877842. Epub 2018 Oct 25.
5
Challenges and Opportunities for Using Big Health Care Data to Advance Medical Science and Public Health.利用大健康医疗数据推进医学科学和公共卫生的挑战与机遇。
Am J Epidemiol. 2019 May 1;188(5):851-861. doi: 10.1093/aje/kwy292.
6
DISTRIBUTED TESTING AND ESTIMATION UNDER SPARSE HIGH DIMENSIONAL MODELS.稀疏高维模型下的分布式测试与估计
Ann Stat. 2018 Jun;46(3):1352-1382. doi: 10.1214/17-AOS1587. Epub 2018 May 3.
7
Multiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data.高维数据存在时一般缺失数据模式的多重填补
Sci Rep. 2016 Feb 12;6:21689. doi: 10.1038/srep21689.
8
Sparse meta-analysis with high-dimensional data.高维数据的稀疏荟萃分析。
Biostatistics. 2016 Apr;17(2):205-20. doi: 10.1093/biostatistics/kxv038. Epub 2015 Sep 21.
9
pSCANNER: patient-centered Scalable National Network for Effectiveness Research.pSCANNER:以患者为中心的可扩展全国有效性研究网络。
J Am Med Inform Assoc. 2014 Jul-Aug;21(4):621-6. doi: 10.1136/amiajnl-2014-002751. Epub 2014 Apr 29.
10
On the relative efficiency of using summary statistics versus individual-level data in meta-analysis.关于在荟萃分析中使用汇总统计量与个体水平数据的相对效率。
Biometrika. 2010 Jun;97(2):321-332. doi: 10.1093/biomet/asq006. Epub 2010 Apr 15.