在促进隐私保护的同时扩大对大规模基因组数据的访问：一种博弈论方法。

Expanding Access to Large-Scale Genomic Data While Promoting Privacy: A Game Theoretic Approach.

作者信息

Wan Zhiyu, Vorobeychik Yevgeniy, Xia Weiyi, Clayton Ellen Wright, Kantarcioglu Murat, Malin Bradley

机构信息

Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37212, USA.

出版信息

Am J Hum Genet. 2017 Feb 2;100(2):316-322. doi: 10.1016/j.ajhg.2016.12.002. Epub 2017 Jan 5.

DOI:10.1016/j.ajhg.2016.12.002

PMID:28065469

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5294764/

Abstract

Emerging scientific endeavors are creating big data repositories of data from millions of individuals. Sharing data in a privacy-respecting manner could lead to important discoveries, but high-profile demonstrations show that links between de-identified genomic data and named persons can sometimes be reestablished. Such re-identification attacks have focused on worst-case scenarios and spurred the adoption of data-sharing practices that unnecessarily impede research. To mitigate concerns, organizations have traditionally relied upon legal deterrents, like data use agreements, and are considering suppressing or adding noise to genomic variants. In this report, we use a game theoretic lens to develop more effective, quantifiable protections for genomic data sharing. This is a fundamentally different approach because it accounts for adversarial behavior and capabilities and tailors protections to anticipated recipients with reasonable resources, not adversaries with unlimited means. We demonstrate this approach via a new public resource with genomic summary data from over 8,000 individuals-the Sequence and Phenotype Integration Exchange (SPHINX)-and show that risks can be balanced against utility more effectively than with traditional approaches. We further show the generalizability of this framework by applying it to other genomic data collection and sharing endeavors. Recognizing that such models are dependent on a variety of parameters, we perform extensive sensitivity analyses to show that our findings are robust to their fluctuations.

摘要

新兴的科学研究正在创建包含数百万个体数据的大数据存储库。以尊重隐私的方式共享数据可能会带来重要发现，但一些备受瞩目的案例表明，去标识化的基因组数据与特定个人之间的联系有时可能会被重新建立。此类重新识别攻击主要集中在最坏的情况，并促使人们采用了一些不必要阻碍研究的数据共享做法。为了减轻担忧，各组织传统上依赖法律威慑手段，如数据使用协议，并正在考虑对基因组变异进行抑制或添加噪声处理。在本报告中，我们运用博弈论视角为基因组数据共享开发更有效、可量化的保护措施。这是一种根本不同的方法，因为它考虑了对抗性行为和能力，并根据预期接收者的合理资源而非手段无限的对手来定制保护措施。我们通过一个新的公共资源展示了这种方法，该资源包含来自8000多名个体的基因组汇总数据——序列与表型整合交换库（SPHINX），并表明与传统方法相比，风险与效用能够得到更有效的平衡。我们还通过将该框架应用于其他基因组数据收集和共享工作，展示了其通用性。鉴于此类模型依赖于各种参数，我们进行了广泛的敏感性分析，以表明我们的发现对其波动具有稳健性。

相似文献

Expanding Access to Large-Scale Genomic Data While Promoting Privacy: A Game Theoretic Approach.在促进隐私保护的同时扩大对大规模基因组数据的访问：一种博弈论方法。

Am J Hum Genet. 2017 Feb 2;100(2):316-322. doi: 10.1016/j.ajhg.2016.12.002. Epub 2017 Jan 5.

Using game theory to thwart multistage privacy intrusions when sharing data.在数据共享时运用博弈论来挫败多阶段隐私入侵。

Sci Adv. 2021 Dec 10;7(50):eabe9986. doi: 10.1126/sciadv.abe9986.

Optimizing annotation resources for natural language de-identification via a game theoretic framework.通过博弈论框架优化用于自然语言去识别的注释资源。

J Biomed Inform. 2016 Jun;61:97-109. doi: 10.1016/j.jbi.2016.03.019. Epub 2016 Mar 25.

Challenges of web-based personal genomic data sharing.基于网络的个人基因组数据共享面临的挑战。

Life Sci Soc Policy. 2015;11:3. doi: 10.1186/s40504-014-0022-7. Epub 2015 Mar 27.

A game theoretic framework for analyzing re-identification risk.一种用于分析重新识别风险的博弈论框架。

PLoS One. 2015 Mar 25;10(3):e0120592. doi: 10.1371/journal.pone.0120592. eCollection 2015.

GenoShare: Supporting Privacy-Informed Decisions for Sharing Individual-Level Genetic Data.基因共享：支持在保护隐私前提下共享个人层面基因数据的决策

Stud Health Technol Inform. 2020 Jun 16;270:238-241. doi: 10.3233/SHTI200158.

Simulating the Large-Scale Erosion of Genomic Privacy Over Time.随时间模拟大规模基因组隐私侵蚀。

IEEE/ACM Trans Comput Biol Bioinform. 2018 Sep-Oct;15(5):1405-1412. doi: 10.1109/TCBB.2018.2859380. Epub 2018 Jul 24.

Emerging technologies towards enhancing privacy in genomic data sharing.新兴技术在增强基因组数据共享中的隐私保护。

Genome Biol. 2019 Jul 2;20(1):128. doi: 10.1186/s13059-019-1741-0.

Genes wide open: Data sharing and the social gradient of genomic privacy.基因全面公开：数据共享与基因组隐私的社会梯度

AJOB Empir Bioeth. 2018 Oct-Dec;9(4):207-221. doi: 10.1080/23294515.2018.1550123. Epub 2018 Dec 31.

Opportunities and Challenges in Interpreting and Sharing Personal Genomes.解读和分享个人基因组面临的机遇与挑战

Genes (Basel). 2019 Aug 25;10(9):643. doi: 10.3390/genes10090643.

引用本文的文献

Demystifying the likelihood of reidentification in neuroimaging data: A technical and regulatory analysis.揭开神经影像数据中重新识别可能性的神秘面纱：一项技术与监管分析。

Imaging Neurosci (Camb). 2024 Mar 22;2. doi: 10.1162/imag_a_00111. eCollection 2024.

A roadmap to precision medicine through post-genomic electronic medical records.通过基因组后电子病历实现精准医疗的路线图。

Nat Commun. 2025 Feb 17;16(1):1700. doi: 10.1038/s41467-025-56442-4.

Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review.ChatGPT 及其他会话型大型语言模型在医疗保健中的应用及关注：系统评价。

J Med Internet Res. 2024 Nov 7;26:e22769. doi: 10.2196/22769.

Large-scale genotype prediction from RNA sequence data necessitates a new ethical and policy framework.从RNA序列数据进行大规模基因型预测需要一个新的伦理和政策框架。

Nat Genet. 2024 Aug;56(8):1537-1540. doi: 10.1038/s41588-024-01825-4.

PanDa Game: Optimized Privacy-Preserving Publishing of Individual-Level Pandemic Data Based on a Game Theoretic Model.熊猫游戏：基于博弈论模型的个体级大流行病数据优化隐私保护发布。

IEEE Trans Nanobioscience. 2023 Oct;22(4):808-817. doi: 10.1109/TNB.2023.3284092. Epub 2023 Oct 3.

Enabling tradeoffs in privacy and utility in genomic data Beacons and summary statistics.在基因组数据信标和汇总统计中实现隐私和效用之间的权衡。

Genome Res. 2023 Jul;33(7):1113-1123. doi: 10.1101/gr.277674.123. Epub 2023 May 22.

A Multifaceted benchmarking of synthetic electronic health record generation models.综合电子健康记录生成模型的多方面基准测试。

Nat Commun. 2022 Dec 9;13(1):7609. doi: 10.1038/s41467-022-35295-1.

Open tools for quantitative anonymization of tabular phenotype data: literature review.用于表格表型数据定量匿名化的开放工具：文献综述。

Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac440.

Sociotechnical safeguards for genomic data privacy.基因组数据隐私的社会技术保障措施。

Nat Rev Genet. 2022 Jul;23(7):429-445. doi: 10.1038/s41576-022-00455-y. Epub 2022 Mar 4.

Lessons learned from the eMERGE Network: balancing genomics in discovery and practice.从eMERGE网络中吸取的经验教训：在发现与实践中平衡基因组学。

HGG Adv. 2020 Dec 25;2(1):100018. doi: 10.1016/j.xhgg.2020.100018. eCollection 2021 Jan 14.

本文引用的文献

Enabling Privacy-Preserving GWASs in Heterogeneous Human Populations.在异质人群中实现保护隐私的 GWASs。

Cell Syst. 2016 Jul;3(1):54-61. doi: 10.1016/j.cels.2016.04.013. Epub 2016 Jul 21.

Efficient analysis of large datasets and sex bias with ADMIXTURE.使用ADMIXTURE对大型数据集和性别偏差进行有效分析。

BMC Bioinformatics. 2016 May 23;17:218. doi: 10.1186/s12859-016-1082-x.

Big data: The power of petabytes.大数据：拍字节的力量。

Nature. 2015 Nov 5;527(7576):S2-4. doi: 10.1038/527S2a.

Privacy Risks from Genomic Data-Sharing Beacons.基因组数据共享信标带来的隐私风险。

Am J Hum Genet. 2015 Nov 5;97(5):631-46. doi: 10.1016/j.ajhg.2015.09.010. Epub 2015 Oct 29.

Million Veteran Program: A mega-biobank to study genetic influences on health and disease.百万退伍军人计划：一个大型生物银行，用于研究遗传对健康和疾病的影响。

J Clin Epidemiol. 2016 Feb;70:214-23. doi: 10.1016/j.jclinepi.2015.09.016. Epub 2015 Oct 9.

A global reference for human genetic variation.人类遗传变异的全球参考。

Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.

Biological data sciences in genome research.基因组研究中的生物数据科学。

Genome Res. 2015 Oct;25(10):1417-22. doi: 10.1101/gr.191684.115.

Assessing data intrusion threats.评估数据入侵威胁。

Science. 2015 Apr 10;348(6231):194-5. doi: 10.1126/science.348.6231.194-b.

A game theoretic framework for analyzing re-identification risk.一种用于分析重新识别风险的博弈论框架。

PLoS One. 2015 Mar 25;10(3):e0120592. doi: 10.1371/journal.pone.0120592. eCollection 2015.

A new initiative on precision medicine.一项关于精准医学的新倡议。

N Engl J Med. 2015 Feb 26;372(9):793-5. doi: 10.1056/NEJMp1500523. Epub 2015 Jan 30.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验