联邦查询临床数据存储库中的准确性和隐私平衡：算法的开发和验证。

Balancing Accuracy and Privacy in Federated Queries of Clinical Data Repositories: Algorithm Development and Validation.

机构信息

Computer & Mathematical Sciences, University of Toronto, Toronto, ON, Canada.

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States.

出版信息

J Med Internet Res. 2020 Nov 3;22(11):e18735. doi: 10.2196/18735.

DOI:10.2196/18735

PMID:33141090

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7671849/

Abstract

BACKGROUND

Over the past decade, the emergence of several large federated clinical data networks has enabled researchers to access data on millions of patients at dozens of health care organizations. Typically, queries are broadcast to each of the sites in the network, which then return aggregate counts of the number of matching patients. However, because patients can receive care from multiple sites in the network, simply adding the numbers frequently double counts patients. Various methods such as the use of trusted third parties or secure multiparty computation have been proposed to link patient records across sites. However, they either have large trade-offs in accuracy and privacy or are not scalable to large networks.

OBJECTIVE

This study aims to enable accurate estimates of the number of patients matching a federated query while providing strong guarantees on the amount of protected medical information revealed.

METHODS

We introduce a novel probabilistic approach to running federated network queries. It combines an algorithm called HyperLogLog with obfuscation in the form of hashing, masking, and homomorphic encryption. It is tunable, in that it allows networks to balance accuracy versus privacy, and it is computationally efficient even for large networks. We built a user-friendly free open-source benchmarking platform to simulate federated queries in large hospital networks. Using this platform, we compare the accuracy, k-anonymity privacy risk (with k=10), and computational runtime of our algorithm with several existing techniques.

RESULTS

In simulated queries matching 1 to 100 million patients in a 100-hospital network, our method was significantly more accurate than adding aggregate counts while maintaining k-anonymity. On average, it required a total of 12 kilobytes of data to be sent to the network hub and added only 5 milliseconds to the overall federated query runtime. This was orders of magnitude better than other approaches, which guaranteed the exact answer.

CONCLUSIONS

Using our method, it is possible to run highly accurate federated queries of clinical data repositories that both protect patient privacy and scale to large networks.

摘要

背景

在过去的十年中，出现了几个大型的联合临床数据网络，使研究人员能够访问数十个医疗机构的数百万患者的数据。通常，查询会广播到网络中的每个站点，然后返回匹配患者数量的汇总计数。但是，由于患者可以在网络中的多个站点接受治疗，简单地相加数字经常会重复计算患者。已经提出了各种方法，例如使用可信第三方或安全多方计算，来在站点之间链接患者记录。但是，它们要么在准确性和隐私性方面存在很大的权衡，要么无法扩展到大型网络。

目的

本研究旨在实现对符合联合查询的患者数量进行准确估计，同时对所揭示的受保护医疗信息数量提供强有力的保证。

方法

我们引入了一种新的概率方法来运行联合网络查询。它结合了一种称为 HyperLogLog 的算法和以哈希、掩码和同态加密形式的混淆。它是可调整的，即它允许网络在准确性与隐私性之间进行平衡，并且即使对于大型网络，它的计算效率也很高。我们构建了一个用户友好的免费开源基准测试平台，以模拟大型医院网络中的联合查询。使用该平台，我们将我们的算法与几种现有技术的准确性、k-匿名隐私风险（k=10）和计算运行时进行了比较。

结果

在模拟的查询中，在一个 100 家医院的网络中匹配 1 到 1000 万患者，我们的方法在保持 k-匿名性的同时，比添加汇总计数要准确得多。平均而言，总共需要向网络中心发送 12 千字节的数据，并且只需要增加 5 毫秒的联邦查询总运行时间。这比其他保证准确答案的方法要好几个数量级。

结论

使用我们的方法，可以运行对临床数据存储库进行高度准确的联合查询，既能保护患者隐私，又能扩展到大型网络。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3e7/7671849/da3c9eaec981/jmir_v22i11e18735_fig1.jpg

相似文献

Balancing Accuracy and Privacy in Federated Queries of Clinical Data Repositories: Algorithm Development and Validation.

J Med Internet Res. 2020 Nov 3;22(11):e18735. doi: 10.2196/18735.

Federated queries of clinical data repositories: the sum of the parts does not equal the whole.

J Am Med Inform Assoc. 2013 Jun;20(e1):e155-61. doi: 10.1136/amiajnl-2012-001299. Epub 2013 Jan 24.

Expected 10-anonymity of HyperLogLog sketches for federated queries of clinical data repositories.

Bioinformatics. 2021 Jul 12;37(Suppl_1):i151-i160. doi: 10.1093/bioinformatics/btab292.

Privacy-Preserving Patient Similarity Learning in a Federated Environment: Development and Analysis.

JMIR Med Inform. 2018 Apr 13;6(2):e20. doi: 10.2196/medinform.7744.

The FeatureCloud Platform for Federated Learning in Biomedicine: Unified Approach.

J Med Internet Res. 2023 Jul 12;25:e42621. doi: 10.2196/42621.

Extension of physical activity recognition with 3D CNN using encrypted multiple sensory data to federated learning based on multi-key homomorphic encryption.

Comput Methods Programs Biomed. 2024 Jan;243:107854. doi: 10.1016/j.cmpb.2023.107854. Epub 2023 Oct 16.

A Federated Record Linkage Algorithm for Secure Medical Data Sharing.

Stud Health Technol Inform. 2021 May 24;278:142-149. doi: 10.3233/SHTI210062.

Privacy-preserving federated neural network learning for disease-associated cell classification.

Patterns (N Y). 2022 Apr 18;3(5):100487. doi: 10.1016/j.patter.2022.100487. eCollection 2022 May 13.

Privacy preserving probabilistic record linkage (P3RL): a novel method for linking existing health-related data and maintaining participant confidentiality.

BMC Med Res Methodol. 2015 May 30;15:46. doi: 10.1186/s12874-015-0038-6.

A system to build distributed multivariate models and manage disparate data sharing policies: implementation in the scalable national network for effectiveness research.

J Am Med Inform Assoc. 2015 Nov;22(6):1187-95. doi: 10.1093/jamia/ocv017. Epub 2015 Jul 3.

引用本文的文献

Towards cross-application model-agnostic federated cohort discovery.

J Am Med Inform Assoc. 2024 Oct 1;31(10):2202-2209. doi: 10.1093/jamia/ocae211.

Expected 10-anonymity of HyperLogLog sketches for federated queries of clinical data repositories.

Bioinformatics. 2021 Jul 12;37(Suppl_1):i151-i160. doi: 10.1093/bioinformatics/btab292.

本文引用的文献

Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network.

JAMIA Open. 2019 Sep 27;2(4):562-569. doi: 10.1093/jamiaopen/ooz050. eCollection 2019 Dec.

Accrual to Clinical Trials (ACT): A Clinical and Translational Science Award Consortium Network.

JAMIA Open. 2018 Oct;1(2):147-152. doi: 10.1093/jamiaopen/ooy033. Epub 2018 Aug 21.

Realizing private and practical pharmacological collaboration.

Science. 2018 Oct 19;362(6412):347-350. doi: 10.1126/science.aat4807.

Secure genome-wide association analysis using multiparty computation.

Nat Biotechnol. 2018 Jul;36(6):547-551. doi: 10.1038/nbt.4108. Epub 2018 May 7.

Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation.

BMC Med Inform Decis Mak. 2017 Jan 3;17(1):1. doi: 10.1186/s12911-016-0389-x.

Design and implementation of a privacy preserving electronic health record linkage tool in Chicago.

J Am Med Inform Assoc. 2015 Sep;22(5):1072-80. doi: 10.1093/jamia/ocv038. Epub 2015 Jun 23.

Federated queries of clinical data repositories: Scaling to a national network.

J Biomed Inform. 2015 Jun;55:231-6. doi: 10.1016/j.jbi.2015.04.012. Epub 2015 May 6.

Securely measuring the overlap between private datasets with cryptosets.

PLoS One. 2015 Feb 25;10(2):e0117898. doi: 10.1371/journal.pone.0117898. eCollection 2015.

Launching PCORnet, a national patient-centered clinical research network.

J Am Med Inform Assoc. 2014 Jul-Aug;21(4):578-82. doi: 10.1136/amiajnl-2014-002747. Epub 2014 May 12.

SHRINE: enabling nationally scalable multi-site disease studies.

PLoS One. 2013;8(3):e55811. doi: 10.1371/journal.pone.0055811. Epub 2013 Mar 7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

联邦查询临床数据存储库中的准确性和隐私平衡：算法的开发和验证。

Balancing Accuracy and Privacy in Federated Queries of Clinical Data Repositories: Algorithm Development and Validation.

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献