从效用和隐私的视角探索人工基因组在全基因组关联研究中的应用。

Exploring the use of Artificial Genomes for Genome-wide Association Studies through the lens of Utility and Privacy.

作者信息

Wang Xinyue, Min Sitao, Vaidya Jaideep

机构信息

Rutgers University, Newark, NJ.

出版信息

AMIA Annu Symp Proc. 2025 May 22;2024:1196-1205. eCollection 2024.

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12099349/

Abstract

Collaborative Genome-wide association studies (GWAS) have the potential to uncover rare genetic variant-trait associations by leveraging larger datasets and diverse population samples. Despite this potential, privacy concerns and cumbersome review processes for data validation and collaborator selection hinder their broader implementation. Advances in generative models present a possible solution by generating synthetic datasets that closely resemble real genomic data, thus enhancing privacy and expediting the review process. This study assesses the capability of deep generative models to produce artificial genomic data for GWAS applications. We evaluate two state-of-the-art models on real-world datasets, identifying significant limitations in their ability to generate high-quality artificial genomes. Furthermore, we demonstrate that prevailing privacy measures, mainly based on membership inference attacks, are inadequate for providing insightful privacy evaluations. Our findings highlight the critical challenges and suggest future directions for the effective use of artificial genomes in GWAS.

摘要

合作性全基因组关联研究（GWAS）有潜力通过利用更大的数据集和多样的人群样本，揭示罕见的基因变异与性状之间的关联。尽管有这种潜力，但隐私问题以及数据验证和合作者选择方面繁琐的审查流程阻碍了它们的更广泛应用。生成模型的进展提供了一种可能的解决方案，即生成与真实基因组数据非常相似的合成数据集，从而增强隐私并加快审查过程。本研究评估了深度生成模型为GWAS应用生成人工基因组数据的能力。我们在真实世界数据集上评估了两个最先进的模型，发现它们在生成高质量人工基因组的能力方面存在重大局限性。此外，我们证明，主要基于成员推理攻击的现行隐私措施不足以提供有洞察力的隐私评估。我们的研究结果突出了关键挑战，并为在GWAS中有效使用人工基因组提出了未来方向。

相似文献

1

Exploring the use of Artificial Genomes for Genome-wide Association Studies through the lens of Utility and Privacy.从效用和隐私的视角探索人工基因组在全基因组关联研究中的应用。

AMIA Annu Symp Proc. 2025 May 22;2024:1196-1205. eCollection 2024.

2

Inference attacks against differentially private query results from genomic datasets including dependent tuples.针对包含依赖元组的基因组数据集的差分隐私查询结果的推理攻击。

Bioinformatics. 2020 Jul 1;36(Suppl_1):i136-i145. doi: 10.1093/bioinformatics/btaa475.

3

Privacy-preserving GWAS analysis on federated genomic datasets.联邦基因组数据集上的隐私保护全基因组关联研究分析

BMC Med Inform Decis Mak. 2015;15 Suppl 5(Suppl 5):S2. doi: 10.1186/1472-6947-15-S5-S2. Epub 2015 Dec 21.

4

Genomic privacy preservation in genome-wide association studies: taxonomy, limitations, challenges, and vision.全基因组关联研究中的基因组隐私保护：分类法、局限性、挑战和展望。

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae356.

5

Enabling Privacy-Preserving GWASs in Heterogeneous Human Populations.在异质人群中实现保护隐私的 GWASs。

Cell Syst. 2016 Jul;3(1):54-61. doi: 10.1016/j.cels.2016.04.013. Epub 2016 Jul 21.

6

Privacy-preserving federated genome-wide association studies via dynamic sampling.通过动态采样实现保护隐私的联邦全基因组关联研究。

Bioinformatics. 2023 Oct 3;39(10). doi: 10.1093/bioinformatics/btad639.

7

Ensuring privacy and security of genomic data and functionalities.确保基因组数据和功能的隐私和安全。

Brief Bioinform. 2020 Mar 23;21(2):511-526. doi: 10.1093/bib/bbz013.

8

A community assessment of privacy preserving techniques for human genomes.人类基因组隐私保护技术的社区评估

BMC Med Inform Decis Mak. 2014;14 Suppl 1(Suppl 1):S1. doi: 10.1186/1472-6947-14-S1-S1. Epub 2014 Dec 8.

9

Differential privacy under dependent tuples-the case of genomic privacy.相依元组下的差分隐私-基因组隐私案例。

Bioinformatics. 2020 Mar 1;36(6):1696-1703. doi: 10.1093/bioinformatics/btz837.

10

Deep convolutional and conditional neural networks for large-scale genomic data generation.深度卷积和条件神经网络在大规模基因组数据生成中的应用。

PLoS Comput Biol. 2023 Oct 30;19(10):e1011584. doi: 10.1371/journal.pcbi.1011584. eCollection 2023 Oct.

本文引用的文献

1

Generating synthetic personal health data using conditional generative adversarial networks combining with differential privacy.使用条件生成对抗网络结合差分隐私生成合成个人健康数据。

J Biomed Inform. 2023 Jul;143:104404. doi: 10.1016/j.jbi.2023.104404. Epub 2023 Jun 1.

2

Evaluation of different approaches for missing data imputation on features associated to genomic data.评估基因组数据相关特征中缺失数据插补的不同方法。

BioData Min. 2021 Sep 3;14(1):44. doi: 10.1186/s13040-021-00274-7.

3

The polygenic architecture of schizophrenia - rethinking pathogenesis and nosology.精神分裂症的多基因结构——重新思考发病机制和分类学。

Nat Rev Neurol. 2020 Jul;16(7):366-379. doi: 10.1038/s41582-020-0364-0. Epub 2020 Jun 11.

4

Benefits and limitations of genome-wide association studies.全基因组关联研究的优势和局限性。

Nat Rev Genet. 2019 Aug;20(8):467-484. doi: 10.1038/s41576-019-0127-1.

5

Genetic disease risks can be misestimated across global populations.遗传疾病风险在全球人群中可能被错误估计。

Genome Biol. 2018 Nov 14;19(1):179. doi: 10.1186/s13059-018-1561-7.

6

A Scalable Privacy-preserving Data Generation Methodology for Exploratory Analysis.一种用于探索性分析的可扩展隐私保护数据生成方法。

AMIA Annu Symp Proc. 2018 Apr 16;2017:1695-1704. eCollection 2017.

7

A global reference for human genetic variation.人类遗传变异的全球参考。

Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.

8

Second-generation PLINK: rising to the challenge of larger and richer datasets.第二代PLINK：应对更大、更丰富数据集的挑战

Gigascience. 2015 Feb 25;4:7. doi: 10.1186/s13742-015-0047-8. eCollection 2015.

9

openSNP--a crowdsourced web resource for personal genomics.openSNP--一个用于个人基因组学的众包网络资源。

PLoS One. 2014 Mar 19;9(3):e89204. doi: 10.1371/journal.pone.0089204. eCollection 2014.

10

Initial impact of the sequencing of the human genome.人类基因组测序的初步影响。

Nature. 2011 Feb 10;470(7333):187-97. doi: 10.1038/nature09792.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验