Suppr超能文献

关于使用大规模缺失值对医学微观数据进行匿名化处理——以FAERS数据集为例的研究

On Anonymizing Medical Microdata with Large-Scale Missing Values - A Case Study with the FAERS Dataset.

作者信息

Hsiao Mei-Hui, Lin Wen-Yang, Hsu Kuang-Yung, Shen Zih-Xun

出版信息

Annu Int Conf IEEE Eng Med Biol Soc. 2019 Jul;2019:6505-6508. doi: 10.1109/EMBC.2019.8857025.

Abstract

As big data analysis becomes one of the main driving forces for productivity and economic growth, the concern of individual privacy disclosure increases as well, especially for applications accessing medical or health data that contain personal information. Most contemporary techniques for privacy preserving data publishing follow a simple assumption-the data of concern is complete, i.e., containing no missing values, which however is not the case in the real world. This paper presents our endeavors on inspecting the effect of missing values upon medical data privacy. In particular, we inspected the US FAERS dataset, a public dataset containing adverse drug events released by US FDA. Following the presumption of current anonymization paradigm-the data should contain no missing values, we investigated three intuitive strategies, including or excluding missing values or executing imputation, to anonymize the FAERS dataset. Our results demonstrate the awkwardness of these intuitive strategies in handling data with a massive amount of missing values. Accordingly, we propose a new strategy, consolidation, and the corresponding privacy protection model and anonymization algorithm. Experimental results show that our method can prevent privacy disclosure and sustain the data utility for ADR signal detection.

摘要

随着大数据分析成为生产力和经济增长的主要驱动力之一,个人隐私泄露问题也日益受到关注,尤其是对于访问包含个人信息的医疗或健康数据的应用程序而言。大多数当代隐私保护数据发布技术都遵循一个简单假设——所关注的数据是完整的,即不包含缺失值,但现实世界中并非如此。本文介绍了我们在研究缺失值对医疗数据隐私影响方面所做的努力。具体而言,我们检查了美国FDA不良事件报告系统(FAERS)数据集,这是美国食品药品监督管理局发布的一个包含药品不良事件的公共数据集。按照当前匿名化范式的假设——数据不应包含缺失值,我们研究了三种直观策略,包括包含或排除缺失值或进行插补,以对FAERS数据集进行匿名化处理。我们的结果表明,这些直观策略在处理存在大量缺失值的数据时存在尴尬之处。因此,我们提出了一种新策略——合并,以及相应的隐私保护模型和匿名化算法。实验结果表明,我们的方法可以防止隐私泄露,并在药物不良反应信号检测中保持数据效用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验