垂直分区不完整数据的隐私保护方法。

Privacy-Preserving Methods for Vertically Partitioned Incomplete Data.

机构信息

Emory, Atlanta, GA, USA.

University of Texas Health Science Center, Houston, TX, USA.

出版信息

AMIA Annu Symp Proc. 2021 Jan 25;2020:348-357. eCollection 2020.

PMID:33936407

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8075536/

Abstract

Distributed health data networks that use information from multiple sources have drawn substantial interest in recent years. However, missing data are prevalent in such networks and present significant analytical challenges. The current state-of-the-art methods for handling missing data require pooling data into a central repository before analysis, which may not be possible in a distributed health data network. In this paper, we propose a privacy- preserving distributed analysis framework for handling missing data when data are vertically partitioned. In this framework, each institution with a particular data source utilizes the local private data to calculate necessary intermediate aggregated statistics, which are then shared to build a global model for handling missing data. To evaluate our proposed methods, we conduct simulation studies that clearly demonstrate that the proposed privacy- preserving methods perform as well as the methods using the pooled data and outperform several naive methods. We further illustrate the proposed methods through the analysis of a real dataset. The proposed framework for handling vertically partitioned incomplete data is substantially more privacy-preserving than methods that require pooling of the data, since no individual-level data are shared, which can lower hurdles for collaboration across multiple institutions and build stronger public trust.

摘要

近年来，利用多源信息的分布式健康数据网络引起了广泛关注。然而，此类网络中普遍存在缺失数据，这给分析带来了重大挑战。目前处理缺失数据的最先进方法要求在分析前将数据汇集到中央存储库中，但在分布式健康数据网络中可能无法实现。在本文中，我们提出了一种隐私保护的分布式分析框架，用于处理垂直分区时的数据缺失问题。在该框架中，每个具有特定数据源的机构都利用本地私有数据来计算必要的中间聚合统计信息，然后将这些统计信息共享以构建用于处理缺失数据的全局模型。为了评估我们提出的方法，我们进行了模拟研究，这些研究清楚地表明，所提出的隐私保护方法的性能与使用汇集数据的方法一样好，并且优于几种简单的方法。我们通过对真实数据集的分析进一步说明了所提出的方法。与需要汇集数据的方法相比，用于处理垂直分区不完整数据的所提出框架在隐私保护方面有了实质性的提高，因为没有共享任何个人级别的数据，这可以降低多个机构之间合作的障碍，并建立更强的公众信任。

相似文献

Privacy-Preserving Methods for Vertically Partitioned Incomplete Data.垂直分区不完整数据的隐私保护方法。

AMIA Annu Symp Proc. 2021 Jan 25;2020:348-357. eCollection 2020.

Multiple imputation for analysis of incomplete data in distributed health data networks.分布式健康数据网络中不完全数据的多重插补分析。

Nat Commun. 2020 Oct 29;11(1):5467. doi: 10.1038/s41467-020-19270-2.

Using the Personal Health Train for Automated and Privacy-Preserving Analytics on Vertically Partitioned Data.使用个人健康列车对垂直分区数据进行自动化且保护隐私的分析。

Stud Health Technol Inform. 2018;247:581-585.

Preserving differential privacy in deep neural networks with relevance-based adaptive noise imposition.基于相关性的自适应噪声引入保护深度神经网络的差分隐私。

Neural Netw. 2020 May;125:131-141. doi: 10.1016/j.neunet.2020.02.001. Epub 2020 Feb 11.

A collaborative framework for Distributed Privacy-Preserving Support Vector Machine learning.一种用于分布式隐私保护支持向量机学习的协作框架。

AMIA Annu Symp Proc. 2012;2012:1350-9. Epub 2012 Nov 3.

Privacy preserving distributed learning classifiers - Sequential learning with small sets of data.隐私保护分布式学习分类器——基于少量数据集的序贯学习

Comput Biol Med. 2021 Sep;136:104716. doi: 10.1016/j.compbiomed.2021.104716. Epub 2021 Jul 31.

Decentralised, collaborative, and privacy-preserving machine learning for multi-hospital data.去中心化、协作和保护隐私的机器学习，适用于多医院数据。

EBioMedicine. 2024 Mar;101:105006. doi: 10.1016/j.ebiom.2024.105006. Epub 2024 Feb 19.

A Privacy-Preserving Infrastructure for Analyzing Personal Health Data in a Vertically Partitioned Scenario.一种用于在垂直分区场景中分析个人健康数据的隐私保护基础设施。

Stud Health Technol Inform. 2019 Aug 21;264:373-377. doi: 10.3233/SHTI190246.

Federated Deep Learning Architecture for Personalized Healthcare.用于个性化医疗保健的联邦深度学习架构。

Stud Health Technol Inform. 2021 May 27;281:193-197. doi: 10.3233/SHTI210147.

Analysis of Application Examples of Differential Privacy in Deep Learning.深度学习中差分隐私应用实例分析。

Comput Intell Neurosci. 2021 Oct 26;2021:4244040. doi: 10.1155/2021/4244040. eCollection 2021.

引用本文的文献

Why Is the Electronic Health Record So Challenging for Research and Clinical Care?电子健康记录为何对研究和临床护理极具挑战性？

Methods Inf Med. 2021 May;60(1-02):32-48. doi: 10.1055/s-0041-1731784. Epub 2021 Jul 19.

本文引用的文献

Linking temporal medical records using non-protected health information data.利用非保护健康信息数据关联时间医疗记录。

Stat Methods Med Res. 2018 Nov;27(11):3304-3324. doi: 10.1177/0962280217698005. Epub 2017 Mar 16.

3D-MICE: integration of cross-sectional and longitudinal imputation for multi-analyte longitudinal clinical data.3D-MICE：用于多分析物纵向临床数据的截面和纵向插补的集成。

J Am Med Inform Assoc. 2018 Jun 1;25(6):645-653. doi: 10.1093/jamia/ocx133.

VERTIcal Grid lOgistic regression (VERTIGO).垂直网格逻辑回归（VERTIGO）。

J Am Med Inform Assoc. 2016 May;23(3):570-9. doi: 10.1093/jamia/ocv146. Epub 2015 Nov 9.

Use of multiple imputation to correct for bias in lung cancer incidence trends by histologic subtype.使用多重填补法校正按组织学亚型划分的肺癌发病率趋势中的偏差。

Cancer Epidemiol Biomarkers Prev. 2014 Aug;23(8):1546-58. doi: 10.1158/1055-9965.EPI-14-0130. Epub 2014 May 22.

Grid Binary LOgistic REgression (GLORE): building shared models without sharing data.网格二进制逻辑回归（GLORE）：在不共享数据的情况下构建共享模型。

J Am Med Inform Assoc. 2012 Sep-Oct;19(5):758-64. doi: 10.1136/amiajnl-2012-000862. Epub 2012 Apr 17.

Differentially Private Empirical Risk Minimization.差分隐私经验风险最小化

J Mach Learn Res. 2011 Mar;12:1069-1109.

Review of inverse probability weighting for dealing with missing data.逆概率加权法处理缺失数据的综述。

Stat Methods Med Res. 2013 Jun;22(3):278-95. doi: 10.1177/0962280210395740. Epub 2011 Jan 10.

The disclosure of diagnosis codes can breach research participants' privacy.诊断编码的披露可能会侵犯研究参与者的隐私。

J Am Med Inform Assoc. 2010 May-Jun;17(3):322-7. doi: 10.1136/jamia.2009.002725.

How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems.如何（不）在分布式网络中保护基因组数据隐私：利用踪迹重新识别来评估和设计匿名保护系统。

J Biomed Inform. 2004 Jun;37(3):179-92. doi: 10.1016/j.jbi.2004.04.005.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验