应对信标重新识别攻击：隐私风险的量化与缓解

Addressing Beacon re-identification attacks: quantification and mitigation of privacy risks.

作者信息

Raisaro Jean Louis, Tramèr Florian, Ji Zhanglong, Bu Diyue, Zhao Yongan, Carey Knox, Lloyd David, Sofia Heidi, Baker Dixie, Flicek Paul, Shringarpure Suyash, Bustamante Carlos, Wang Shuang, Jiang Xiaoqian, Ohno-Machado Lucila, Tang Haixu, Wang XiaoFeng, Hubaux Jean-Pierre

机构信息

School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.

Health Science Department of Biomedical Informatics, University of California San Diego, San Diego, CA, USA.

出版信息

J Am Med Inform Assoc. 2017 Jul 1;24(4):799-805. doi: 10.1093/jamia/ocw167.

DOI:10.1093/jamia/ocw167

PMID:28339683

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5881894/

Abstract

The Global Alliance for Genomics and Health (GA4GH) created the Beacon Project as a means of testing the willingness of data holders to share genetic data in the simplest technical context-a query for the presence of a specified nucleotide at a given position within a chromosome. Each participating site (or "beacon") is responsible for assuring that genomic data are exposed through the Beacon service only with the permission of the individual to whom the data pertains and in accordance with the GA4GH policy and standards.While recognizing the inference risks associated with large-scale data aggregation, and the fact that some beacons contain sensitive phenotypic associations that increase privacy risk, the GA4GH adjudged the risk of re-identification based on the binary yes/no allele-presence query responses as acceptable. However, recent work demonstrated that, given a beacon with specific characteristics (including relatively small sample size and an adversary who possesses an individual's whole genome sequence), the individual's membership in a beacon can be inferred through repeated queries for variants present in the individual's genome.In this paper, we propose three practical strategies for reducing re-identification risks in beacons. The first two strategies manipulate the beacon such that the presence of rare alleles is obscured; the third strategy budgets the number of accesses per user for each individual genome. Using a beacon containing data from the 1000 Genomes Project, we demonstrate that the proposed strategies can effectively reduce re-identification risk in beacon-like datasets.

摘要

全球基因组与健康联盟（GA4GH）创建了灯塔计划，作为在最简单的技术环境中测试数据持有者共享遗传数据意愿的一种方式——查询染色体上给定位置是否存在特定核苷酸。每个参与站点（或“灯塔”）负责确保仅在数据所涉及的个人许可下，并根据GA4GH政策和标准，通过灯塔服务公开基因组数据。虽然认识到与大规模数据聚合相关的推断风险，以及一些灯塔包含会增加隐私风险的敏感表型关联这一事实，但GA4GH判定基于二元是/否等位基因存在查询响应的重新识别风险是可以接受的。然而，最近的研究表明，对于具有特定特征的灯塔（包括相对较小的样本量以及拥有个人全基因组序列的对手），通过对个人基因组中存在的变异进行重复查询，可以推断出个人是否属于某个灯塔。在本文中，我们提出了三种降低灯塔中重新识别风险的实用策略。前两种策略对灯塔进行操作，以使罕见等位基因的存在变得模糊；第三种策略为每个个体基因组的每个用户访问次数设定预算。使用包含来自千人基因组计划数据的灯塔，我们证明了所提出的策略可以有效降低类似灯塔数据集的重新识别风险。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3c9/6080686/ec8f231b1002/ocw167f1.jpg

相似文献

Addressing Beacon re-identification attacks: quantification and mitigation of privacy risks.应对信标重新识别攻击：隐私风险的量化与缓解

J Am Med Inform Assoc. 2017 Jul 1;24(4):799-805. doi: 10.1093/jamia/ocw167.

Privacy Risks from Genomic Data-Sharing Beacons.基因组数据共享信标带来的隐私风险。

Am J Hum Genet. 2015 Nov 5;97(5):631-46. doi: 10.1016/j.ajhg.2015.09.010. Epub 2015 Oct 29.

Re-identification of individuals in genomic data-sharing beacons via allele inference.通过等位基因推断，在基因组数据共享信标中重新识别个人。

Bioinformatics. 2019 Feb 1;35(3):365-371. doi: 10.1093/bioinformatics/bty643.

Controlling the signal: Practical privacy protection of genomic data sharing through Beacon services.控制信号：通过灯塔服务实现基因组数据共享的实用隐私保护

BMC Med Genomics. 2017 Jul 26;10(Suppl 2):39. doi: 10.1186/s12920-017-0282-1.

The effect of kinship in re-identification attacks against genomic data sharing beacons.亲属关系对基因组数据共享信标重新识别攻击的影响。

Bioinformatics. 2020 Dec 30;36(Suppl_2):i903-i910. doi: 10.1093/bioinformatics/btaa821.

Beacon Reconstruction Attack: Reconstruction of genomes in genomic data-sharing beacons using summary statistics.信标重建攻击：利用汇总统计信息在基因组数据共享信标中重建基因组。

Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf273.

Aftermath of bustamante attack on genomic beacon service.布斯塔曼特对基因组灯塔服务的攻击后果。

BMC Med Genomics. 2017 Jul 26;10(Suppl 2):43. doi: 10.1186/s12920-017-0278-x.

Real-time Protection of Genomic Data Sharing in Beacon Services.信标服务中基因组数据共享的实时保护

AMIA Jt Summits Transl Sci Proc. 2018 May 18;2017:45-54. eCollection 2018.

Genome Reconstruction Attacks Against Genomic Data-Sharing Beacons.针对基因组数据共享信标的基因组重建攻击。

Proc Priv Enhanc Technol. 2021;2021(3):28-48. doi: 10.2478/popets-2021-0036. Epub 2021 Apr 26.

Beacon v2 and Beacon networks: A "lingua franca" for federated data discovery in biomedical genomics, and beyond.信标v2与信标网络：生物医学基因组学及其他领域中联邦数据发现的“通用语言”

Hum Mutat. 2022 Jun;43(6):791-799. doi: 10.1002/humu.24369. Epub 2022 Apr 8.

引用本文的文献

Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf273.

FedGMMAT: Federated generalized linear mixed model association tests.FedGMMAT：联邦广义线性混合模型关联测试。

PLoS Comput Biol. 2024 Jul 24;20(7):e1012142. doi: 10.1371/journal.pcbi.1012142. eCollection 2024 Jul.

Assessing Privacy Vulnerabilities in Genetic Data Sets: Scoping Review.评估基因数据集的隐私漏洞：范围综述

JMIR Bioinform Biotechnol. 2024 May 27;5:e54332. doi: 10.2196/54332.

Future-proofing genomic data and consent management: a comprehensive review of technology innovations.未来基因组数据和知情同意管理：技术创新的综合评述。

Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae021.

The BioRef Infrastructure, a Framework for Real-Time, Federated, Privacy-Preserving, and Personalized Reference Intervals: Design, Development, and Application.生物参考信息基础设施：一个用于实时、联合、隐私保护和个性化参考区间的框架：设计、开发和应用。

J Med Internet Res. 2023 Oct 18;25:e47254. doi: 10.2196/47254.

COLLAGENE enables privacy-aware federated and collaborative genomic data analysis.COLLAGENE 实现了隐私感知的联邦和协作基因组数据分析。

Genome Biol. 2023 Sep 11;24(1):204. doi: 10.1186/s13059-023-03039-z.

Enabling tradeoffs in privacy and utility in genomic data Beacons and summary statistics.在基因组数据信标和汇总统计中实现隐私和效用之间的权衡。

Genome Res. 2023 Jul;33(7):1113-1123. doi: 10.1101/gr.277674.123. Epub 2023 May 22.

Sociotechnical safeguards for genomic data privacy.基因组数据隐私的社会技术保障措施。

Nat Rev Genet. 2022 Jul;23(7):429-445. doi: 10.1038/s41576-022-00455-y. Epub 2022 Mar 4.

Recent Developments in Privacy-Preserving Mining of Clinical Data.临床数据隐私保护挖掘的最新进展

ACM IMS Trans Data Sci. 2021 Nov;2(4). doi: 10.1145/3447774.

Computational tools for genomic data de-identification: facilitating data protection law compliance.基因组数据去识别的计算工具：促进数据保护法的合规。

Nat Commun. 2021 Nov 29;12(1):6949. doi: 10.1038/s41467-021-27219-2.

本文引用的文献

Analysis of protein-coding genetic variation in 60,706 humans.对60706名人类的蛋白质编码基因变异进行分析。

Nature. 2016 Aug 18;536(7616):285-91. doi: 10.1038/nature19057.

GENOMICS. A federated ecosystem for sharing genomic, clinical data.基因组学。一个用于共享基因组和临床数据的联合生态系统。

Science. 2016 Jun 10;352(6291):1278-80. doi: 10.1126/science.aaf6162.

Framework for responsible sharing of genomic and health-related data.基因组和健康相关数据的责任共享框架

Hugo J. 2014 Dec;8(1):3. doi: 10.1186/s11568-014-0003-1. Epub 2014 Oct 17.

Privacy Risks from Genomic Data-Sharing Beacons.基因组数据共享信标带来的隐私风险。

Am J Hum Genet. 2015 Nov 5;97(5):631-46. doi: 10.1016/j.ajhg.2015.09.010. Epub 2015 Oct 29.

A global reference for human genetic variation.人类遗传变异的全球参考。

Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.

Routes for breaching and protecting genetic privacy.突破和保护遗传隐私的途径。

Nat Rev Genet. 2014 Jun;15(6):409-21. doi: 10.1038/nrg3723. Epub 2014 May 8.

The haystack is made of needles.干草堆是由针组成的。

Genet Test Mol Biomarkers. 2013 Mar;17(3):175-7. doi: 10.1089/gtmb.2012.1542.

Identifying personal genomes by surname inference.姓氏推断识别个人基因组。

Science. 2013 Jan 18;339(6117):321-4. doi: 10.1126/science.1229566.

An integrated map of genetic variation from 1,092 human genomes.1092 个人类基因组遗传变异的综合图谱。

Nature. 2012 Nov 1;491(7422):56-65. doi: 10.1038/nature11632.

Evolution and functional impact of rare coding variation from deep sequencing of human exomes.人类外显子组深度测序中罕见编码变异的进化和功能影响。

Science. 2012 Jul 6;337(6090):64-9. doi: 10.1126/science.1219240. Epub 2012 May 17.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

应对信标重新识别攻击：隐私风险的量化与缓解

Addressing Beacon re-identification attacks: quantification and mitigation of privacy risks.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献