Suppr超能文献

相依元组下的差分隐私-基因组隐私案例。

Differential privacy under dependent tuples-the case of genomic privacy.

机构信息

Computer Engineering Department, Bilkent University, 06800 Ankara, Turkey.

Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106, USA.

出版信息

Bioinformatics. 2020 Mar 1;36(6):1696-1703. doi: 10.1093/bioinformatics/btz837.

Abstract

MOTIVATION

The rapid progress in genome sequencing has led to high availability of genomic data. Studying these data can greatly help answer the key questions about disease associations and our evolution. However, due to growing privacy concerns about the sensitive information of participants, accessing key results and data of genomic studies (such as genome-wide association studies) is restricted to only trusted individuals. On the other hand, paving the way to biomedical breakthroughs and discoveries requires granting open access to genomic datasets. Privacy-preserving mechanisms can be a solution for granting wider access to such data while protecting their owners. In particular, there has been growing interest in applying the concept of differential privacy (DP) while sharing summary statistics about genomic data. DP provides a mathematically rigorous approach to prevent the risk of membership inference while sharing statistical information about a dataset. However, DP does not consider the dependence between tuples in the dataset, which may degrade the privacy guarantees offered by the DP.

RESULTS

In this work, focusing on genomic datasets, we show this drawback of the DP and we propose techniques to mitigate it. First, using a real-world genomic dataset, we demonstrate the feasibility of an inference attack on differentially private query results by utilizing the correlations between the entries in the dataset. The results show the scale of vulnerability when we have dependent tuples in the dataset. We show that the adversary can infer sensitive genomic data about a user from the differentially private results of a query by exploiting the correlations between the genomes of family members. Second, we propose a mechanism for privacy-preserving sharing of statistics from genomic datasets to attain privacy guarantees while taking into consideration the dependence between tuples. By evaluating our mechanism on different genomic datasets, we empirically demonstrate that our proposed mechanism can achieve up to 50% better privacy than traditional DP-based solutions.

AVAILABILITY AND IMPLEMENTATION

https://github.com/nourmadhoun/Differential-privacy-genomic-inference-attack.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

基因组测序的快速发展导致基因组数据的可用性很高。研究这些数据可以极大地帮助回答关于疾病关联和我们进化的关键问题。然而,由于参与者敏感信息的隐私问题日益受到关注,对基因组研究(如全基因组关联研究)的关键结果和数据的访问仅限于受信任的个人。另一方面,为了实现生物医学的突破和发现,需要开放获取基因组数据集。隐私保护机制可以是在保护数据所有者的同时,为这些数据提供更广泛访问的解决方案。特别是,在共享基因组数据的摘要统计信息时,应用差分隐私 (DP) 的概念引起了越来越多的关注。DP 提供了一种严格的数学方法来防止在共享数据集的统计信息时成员推断的风险。然而,DP 不考虑数据集中元组之间的依赖关系,这可能会降低 DP 提供的隐私保证。

结果

在这项工作中,我们专注于基因组数据集,展示了 DP 的这一缺点,并提出了减轻该缺点的技术。首先,我们使用真实的基因组数据集,通过利用数据集中项之间的相关性,展示了对差分私有查询结果进行推断攻击的可行性。结果表明,当我们的数据集中存在相关元组时,脆弱性的规模有多大。我们表明,通过利用家庭成员基因组之间的相关性,攻击者可以从查询的差分私有结果中推断出用户的敏感基因组数据。其次,我们提出了一种用于隐私保护共享基因组数据集统计信息的机制,在考虑元组之间的依赖关系的同时获得隐私保证。通过在不同的基因组数据集上评估我们的机制,我们从经验上证明,我们提出的机制可以比传统的基于 DP 的解决方案实现高达 50%的更好的隐私保护。

可用性和实现

https://github.com/nourmadhoun/Differential-privacy-genomic-inference-attack。

补充信息

补充数据可在 Bioinformatics 在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验