Ghasemian Maryam, Gerido Lynette Hammond, Ayday Erman
Case Western Reserve University, Cleveland, OH.
bioRxiv. 2024 Sep 24:2024.09.20.614092. doi: 10.1101/2024.09.20.614092.
As genomic research continues to advance, sharing of genomic data and research outcomes has become increasingly important for fostering collaboration and accelerating scientific discovery. However, such data sharing must be balanced with the need to protect the privacy of individuals whose genetic information is being utilized. This paper presents a bidirectional framework for evaluating privacy risks associated with data shared (both in terms of summary statistics and research datasets) in genomic research papers, particularly focusing on re-identification risks such as membership inference attacks (MIA). The framework consists of a structured workflow that begins with a questionnaire designed to capture researchers' (authors') self-reported data sharing practices and privacy protection measures. Responses are used to calculate the risk of re-identification for their study (paper) when compared with the National Institutes of Health (NIH) genomic data sharing policy. Any gaps in compliance help us to identify potential vulnerabilities and encourage the researchers to enhance their privacy measures before submitting their research for publication. The paper also demonstrates the application of this framework, using published genomic research as case study scenarios to emphasize the importance of implementing bidirectional frameworks to support trustworthy open science and genomic data sharing practices.
随着基因组研究的不断推进,基因组数据和研究成果的共享对于促进合作和加速科学发现变得越来越重要。然而,这种数据共享必须与保护那些其遗传信息被使用的个人隐私的需求相平衡。本文提出了一个双向框架,用于评估基因组研究论文中共享数据(包括汇总统计数据和研究数据集)相关的隐私风险,特别关注诸如成员推理攻击(MIA)等重新识别风险。该框架由一个结构化工作流程组成,该流程始于一份旨在获取研究人员(作者)自我报告的数据共享实践和隐私保护措施的问卷。与美国国立卫生研究院(NIH)的基因组数据共享政策相比,这些回答用于计算其研究(论文)的重新识别风险。合规方面的任何差距有助于我们识别潜在漏洞,并鼓励研究人员在提交研究成果发表之前加强其隐私措施。本文还通过将已发表的基因组研究作为案例研究场景,展示了该框架的应用,以强调实施双向框架以支持可信的开放科学和基因组数据共享实践的重要性。