Wang Xinyue, Min Sitao, Vaidya Jaideep
Rutgers University, Newark, NJ.
AMIA Annu Symp Proc. 2025 May 22;2024:1196-1205. eCollection 2024.
Collaborative Genome-wide association studies (GWAS) have the potential to uncover rare genetic variant-trait associations by leveraging larger datasets and diverse population samples. Despite this potential, privacy concerns and cumbersome review processes for data validation and collaborator selection hinder their broader implementation. Advances in generative models present a possible solution by generating synthetic datasets that closely resemble real genomic data, thus enhancing privacy and expediting the review process. This study assesses the capability of deep generative models to produce artificial genomic data for GWAS applications. We evaluate two state-of-the-art models on real-world datasets, identifying significant limitations in their ability to generate high-quality artificial genomes. Furthermore, we demonstrate that prevailing privacy measures, mainly based on membership inference attacks, are inadequate for providing insightful privacy evaluations. Our findings highlight the critical challenges and suggest future directions for the effective use of artificial genomes in GWAS.
合作性全基因组关联研究(GWAS)有潜力通过利用更大的数据集和多样的人群样本,揭示罕见的基因变异与性状之间的关联。尽管有这种潜力,但隐私问题以及数据验证和合作者选择方面繁琐的审查流程阻碍了它们的更广泛应用。生成模型的进展提供了一种可能的解决方案,即生成与真实基因组数据非常相似的合成数据集,从而增强隐私并加快审查过程。本研究评估了深度生成模型为GWAS应用生成人工基因组数据的能力。我们在真实世界数据集上评估了两个最先进的模型,发现它们在生成高质量人工基因组的能力方面存在重大局限性。此外,我们证明,主要基于成员推理攻击的现行隐私措施不足以提供有洞察力的隐私评估。我们的研究结果突出了关键挑战,并为在GWAS中有效使用人工基因组提出了未来方向。