• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

去中心化、协作和保护隐私的机器学习,适用于多医院数据。

Decentralised, collaborative, and privacy-preserving machine learning for multi-hospital data.

机构信息

Department of Computer Science, University of Toronto, Canada; Peter Munk Cardiac Centre, University Health Network, Canada; Vector Institute, Toronto, Canada.

Vector Institute, Toronto, Canada; CISPA Helmholtz Center for Information Security, Germany; Department of Electrical and Computer Engineering, University of Toronto, Canada.

出版信息

EBioMedicine. 2024 Mar;101:105006. doi: 10.1016/j.ebiom.2024.105006. Epub 2024 Feb 19.

DOI:10.1016/j.ebiom.2024.105006
PMID:38377795
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10884342/
Abstract

BACKGROUND

Machine Learning (ML) has demonstrated its great potential on medical data analysis. Large datasets collected from diverse sources and settings are essential for ML models in healthcare to achieve better accuracy and generalizability. Sharing data across different healthcare institutions or jurisdictions is challenging because of complex and varying privacy and regulatory requirements. Hence, it is hard but crucial to allow multiple parties to collaboratively train an ML model leveraging the private datasets available at each party without the need for direct sharing of those datasets or compromising the privacy of the datasets through collaboration.

METHODS

In this paper, we address this challenge by proposing Decentralized, Collaborative, and Privacy-preserving ML for Multi-Hospital Data (DeCaPH). This framework offers the following key benefits: (1) it allows different parties to collaboratively train an ML model without transferring their private datasets (i.e., no data centralization); (2) it safeguards patients' privacy by limiting the potential privacy leakage arising from any contents shared across the parties during the training process; and (3) it facilitates the ML model training without relying on a centralized party/server.

FINDINGS

We demonstrate the generalizability and power of DeCaPH on three distinct tasks using real-world distributed medical datasets: patient mortality prediction using electronic health records, cell-type classification using single-cell human genomes, and pathology identification using chest radiology images. The ML models trained with DeCaPH framework have less than 3.2% drop in model performance comparing to those trained by the non-privacy-preserving collaborative framework. Meanwhile, the average vulnerability to privacy attacks of the models trained with DeCaPH decreased by up to 16%. In addition, models trained with our DeCaPH framework achieve better performance than those models trained solely with the private datasets from individual parties without collaboration and those trained with the previous privacy-preserving collaborative training framework under the same privacy guarantee by up to 70% and 18.2% respectively.

INTERPRETATION

We demonstrate that the ML models trained with DeCaPH framework have an improved utility-privacy trade-off, showing DeCaPH enables the models to have good performance while preserving the privacy of the training data points. In addition, the ML models trained with DeCaPH framework in general outperform those trained solely with the private datasets from individual parties, showing that DeCaPH enhances the model generalizability.

FUNDING

This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC, RGPIN-2020-06189 and DGECR-2020-00294), Canadian Institute for Advanced Research (CIFAR) AI Catalyst Grants, CIFAR AI Chair programs, Temerty Professor of AI Research and Education in Medicine, University of Toronto, Amazon, Apple, DARPA through the GARD project, Intel, Meta, the Ontario Early Researcher Award, and the Sloan Foundation. Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute.

摘要

背景

机器学习(ML)在医学数据分析方面显示出了巨大的潜力。为了使医疗保健中的 ML 模型达到更好的准确性和泛化能力,需要从不同的来源和环境中收集大型数据集。由于复杂且不断变化的隐私和监管要求,在不同的医疗机构或司法管辖区之间共享数据具有挑战性。因此,允许多个方在不直接共享这些数据集或通过协作来损害数据集的隐私的情况下,利用每个方可用的私有数据集来协作训练 ML 模型是困难但至关重要的。

方法

在本文中,我们通过提出去中心化、协作和保护隐私的多医院数据的 ML(DeCaPH)来解决这个挑战。该框架提供了以下关键优势:(1)它允许不同的方在不转移其私有数据集的情况下协作训练 ML 模型(即,没有数据集中化);(2)它通过限制在训练过程中各方之间共享的任何内容引起的潜在隐私泄露来保护患者的隐私;(3)它在不依赖中心化方/服务器的情况下促进 ML 模型的训练。

结果

我们使用真实分布的医疗数据集在三个不同的任务上展示了 DeCaPH 的泛化能力和能力:使用电子健康记录预测患者死亡率、使用单细胞人类基因组进行细胞类型分类、以及使用胸部放射图像进行病理学识别。与非隐私保护协作框架训练的模型相比,使用 DeCaPH 框架训练的模型的性能下降不到 3.2%。同时,使用 DeCaPH 框架训练的模型的平均隐私攻击脆弱性降低了 16%。此外,与仅使用单个方的私有数据集进行训练的模型以及在相同隐私保证下使用先前的隐私保护协作训练框架进行训练的模型相比,使用我们的 DeCaPH 框架训练的模型的性能分别提高了 70%和 18.2%。

解释

我们证明了使用 DeCaPH 框架训练的 ML 模型具有改进的效用-隐私权衡,表明 DeCaPH 使模型在保护训练数据点隐私的同时具有良好的性能。此外,使用 DeCaPH 框架训练的 ML 模型的性能通常优于仅使用单个方的私有数据集进行训练的模型,表明 DeCaPH 增强了模型的泛化能力。

资助

这项工作得到了加拿大自然科学与工程研究理事会(NSERC,RGPIN-2020-06189 和 DGECR-2020-00294)、加拿大先进研究所(CIFAR)人工智能催化剂赠款、CIFAR 人工智能主席计划、多伦多大学的人工智能研究和医学教育 Temerty 教授、亚马逊、苹果、DARPA 通过 GARD 项目、英特尔、元、安大略省早期研究员奖和斯隆基金会的支持。准备这项研究使用的资源部分由安大略省、加拿大通过 CIFAR 的联邦政府以及赞助 Vector 研究所的公司提供。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2e/10884342/26be5b43b690/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2e/10884342/5a1536e7940b/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2e/10884342/49dca20b76e5/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2e/10884342/1cb84570e58e/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2e/10884342/c453e37a20e8/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2e/10884342/26be5b43b690/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2e/10884342/5a1536e7940b/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2e/10884342/49dca20b76e5/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2e/10884342/1cb84570e58e/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2e/10884342/c453e37a20e8/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2e/10884342/26be5b43b690/gr5.jpg

相似文献

1
Decentralised, collaborative, and privacy-preserving machine learning for multi-hospital data.去中心化、协作和保护隐私的机器学习,适用于多医院数据。
EBioMedicine. 2024 Mar;101:105006. doi: 10.1016/j.ebiom.2024.105006. Epub 2024 Feb 19.
2
Federated Learning in Glaucoma: A Comprehensive Review and Future Perspectives.青光眼领域的联邦学习:全面综述与未来展望
Ophthalmol Glaucoma. 2025 Jan-Feb;8(1):92-105. doi: 10.1016/j.ogla.2024.08.004. Epub 2024 Aug 29.
3
PrivaTree: Collaborative Privacy-Preserving Training of Decision Trees on Biomedical Data.PrivaTree:在生物医学数据上协同进行隐私保护的决策树训练。
IEEE/ACM Trans Comput Biol Bioinform. 2024 Jan-Feb;21(1):1-13. doi: 10.1109/TCBB.2023.3286274. Epub 2024 Feb 5.
4
Split Learning for Distributed Collaborative Training of Deep Learning Models in Health Informatics.分割学习在健康信息学中深度学习模型分布式协同训练中的应用。
AMIA Annu Symp Proc. 2024 Jan 11;2023:1047-1056. eCollection 2023.
5
Privacy-Preserving Breast Cancer Classification: A Federated Transfer Learning Approach.隐私保护乳腺癌分类:联邦迁移学习方法。
J Imaging Inform Med. 2024 Aug;37(4):1488-1504. doi: 10.1007/s10278-024-01035-8. Epub 2024 Feb 29.
6
A multicenter random forest model for effective prognosis prediction in collaborative clinical research network.多中心随机森林模型在协作临床研究网络中的有效预后预测。
Artif Intell Med. 2020 Mar;103:101814. doi: 10.1016/j.artmed.2020.101814. Epub 2020 Feb 5.
7
Preserving fairness and diagnostic accuracy in private large-scale AI models for medical imaging.在用于医学成像的私有大规模人工智能模型中保持公平性和诊断准确性。
Commun Med (Lond). 2024 Mar 14;4(1):46. doi: 10.1038/s43856-024-00462-6.
8
Federated learning for preserving data privacy in collaborative healthcare research.用于在协作医疗研究中保护数据隐私的联邦学习。
Digit Health. 2022 Oct 27;8:20552076221134455. doi: 10.1177/20552076221134455. eCollection 2022 Jan-Dec.
9
Privacy preserving distributed learning classifiers - Sequential learning with small sets of data.隐私保护分布式学习分类器——基于少量数据集的序贯学习
Comput Biol Med. 2021 Sep;136:104716. doi: 10.1016/j.compbiomed.2021.104716. Epub 2021 Jul 31.
10
The future of Cochrane Neonatal.考克兰新生儿协作网的未来。
Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

引用本文的文献

1
Deep Learning Applications in Clinical Cancer Detection: A Review of Implementation Challenges and Solutions.深度学习在临床癌症检测中的应用:实施挑战与解决方案综述
Mayo Clin Proc Digit Health. 2025 Jul 18;3(3):100253. doi: 10.1016/j.mcpdig.2025.100253. eCollection 2025 Sep.
2
Medical laboratory data-based models: opportunities, obstacles, and solutions.基于医学实验室数据的模型:机遇、障碍与解决方案。
J Transl Med. 2025 Jul 24;23(1):823. doi: 10.1186/s12967-025-06802-x.
3
The open sharing operation mechanism of health data in the digital healthcare era: A study based on grounded theory and interpretative structural modeling method.
数字医疗时代健康数据的开放共享运行机制:基于扎根理论和解释结构模型法的研究
Digit Health. 2025 Jun 25;11:20552076251353694. doi: 10.1177/20552076251353694. eCollection 2025 Jan-Dec.
4
AI-assisted facial analysis in healthcare: From disease detection to comprehensive management.医疗保健中的人工智能辅助面部分析:从疾病检测到综合管理。
Patterns (N Y). 2025 Feb 4;6(2):101175. doi: 10.1016/j.patter.2025.101175. eCollection 2025 Feb 14.
5
Use of AI in Cardiac CT and MRI: A Scientific Statement from the ESCR, EuSoMII, NASCI, SCCT, SCMR, SIIM, and RSNA.人工智能在心脏CT和MRI中的应用:欧洲心血管研究基金会(ESCR)、欧洲心脏影像学会(EuSoMII)、北美心血管影像学会(NASCI)、心血管计算机断层扫描学会(SCCT)、心血管磁共振学会(SCMR)、美国医学影像学会(SIIM)和北美放射学会(RSNA)的科学声明。
Radiology. 2025 Jan;314(1):e240516. doi: 10.1148/radiol.240516.
6
Research Trends and Development Dynamics of qPCR-based Biomarkers: A Comprehensive Bibliometric Analysis.基于qPCR的生物标志物的研究趋势与发展动态:一项综合文献计量分析
Mol Biotechnol. 2025 Jan 22. doi: 10.1007/s12033-024-01356-7.