Suppr超能文献

垂直分区不完整数据的隐私保护方法。

Privacy-Preserving Methods for Vertically Partitioned Incomplete Data.

机构信息

Emory, Atlanta, GA, USA.

University of Texas Health Science Center, Houston, TX, USA.

出版信息

AMIA Annu Symp Proc. 2021 Jan 25;2020:348-357. eCollection 2020.

Abstract

Distributed health data networks that use information from multiple sources have drawn substantial interest in recent years. However, missing data are prevalent in such networks and present significant analytical challenges. The current state-of-the-art methods for handling missing data require pooling data into a central repository before analysis, which may not be possible in a distributed health data network. In this paper, we propose a privacy- preserving distributed analysis framework for handling missing data when data are vertically partitioned. In this framework, each institution with a particular data source utilizes the local private data to calculate necessary intermediate aggregated statistics, which are then shared to build a global model for handling missing data. To evaluate our proposed methods, we conduct simulation studies that clearly demonstrate that the proposed privacy- preserving methods perform as well as the methods using the pooled data and outperform several naive methods. We further illustrate the proposed methods through the analysis of a real dataset. The proposed framework for handling vertically partitioned incomplete data is substantially more privacy-preserving than methods that require pooling of the data, since no individual-level data are shared, which can lower hurdles for collaboration across multiple institutions and build stronger public trust.

摘要

近年来,利用多源信息的分布式健康数据网络引起了广泛关注。然而,此类网络中普遍存在缺失数据,这给分析带来了重大挑战。目前处理缺失数据的最先进方法要求在分析前将数据汇集到中央存储库中,但在分布式健康数据网络中可能无法实现。在本文中,我们提出了一种隐私保护的分布式分析框架,用于处理垂直分区时的数据缺失问题。在该框架中,每个具有特定数据源的机构都利用本地私有数据来计算必要的中间聚合统计信息,然后将这些统计信息共享以构建用于处理缺失数据的全局模型。为了评估我们提出的方法,我们进行了模拟研究,这些研究清楚地表明,所提出的隐私保护方法的性能与使用汇集数据的方法一样好,并且优于几种简单的方法。我们通过对真实数据集的分析进一步说明了所提出的方法。与需要汇集数据的方法相比,用于处理垂直分区不完整数据的所提出框架在隐私保护方面有了实质性的提高,因为没有共享任何个人级别的数据,这可以降低多个机构之间合作的障碍,并建立更强的公众信任。

相似文献

10
Analysis of Application Examples of Differential Privacy in Deep Learning.深度学习中差分隐私应用实例分析。
Comput Intell Neurosci. 2021 Oct 26;2021:4244040. doi: 10.1155/2021/4244040. eCollection 2021.

引用本文的文献

本文引用的文献

1
Linking temporal medical records using non-protected health information data.利用非保护健康信息数据关联时间医疗记录。
Stat Methods Med Res. 2018 Nov;27(11):3304-3324. doi: 10.1177/0962280217698005. Epub 2017 Mar 16.
3
VERTIcal Grid lOgistic regression (VERTIGO).垂直网格逻辑回归(VERTIGO)。
J Am Med Inform Assoc. 2016 May;23(3):570-9. doi: 10.1093/jamia/ocv146. Epub 2015 Nov 9.
7
Review of inverse probability weighting for dealing with missing data.逆概率加权法处理缺失数据的综述。
Stat Methods Med Res. 2013 Jun;22(3):278-95. doi: 10.1177/0962280210395740. Epub 2011 Jan 10.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验