Aziz Md Momin Al, Ghasemi Reza, Waliullah Md, Mohammed Noman
Department of Computer Science, University of Manitoba, Winnipeg, Canada.
Department of Mathematics, Faculty of Sciences, Bu-Ali Sina University, Hamedan, Iran.
BMC Med Genomics. 2017 Jul 26;10(Suppl 2):43. doi: 10.1186/s12920-017-0278-x.
With the enormous need for federated eco-system for holding global genomic and clinical data, Global Alliance for Genomic and Health (GA4GH) has created an international website called beacon service which allows a researcher to find out whether a specific dataset can be utilized to his or her research beforehand. This simple webservice is quite useful as it allows queries like whether a certain position of a target chromosome has a specific nucleotide. However, the increased integration of individuals genomic data into clinical practice and research raised serious privacy concern. Though the answer of such queries are yes or no in Bacon network, it results in serious privacy implication as demonstrated in a recent work from Shringarpure and Bustamante. In their attack model, the authors demonstrated that with a limited number of queries, presence of an individual in any dataset can be determined.
We propose two lightweight algorithms (based on randomized response) which captures the efficacy while preserving the privacy of the participants in a genomic beacon service. We also elaborate the strength and weakness of the attack by explaining some of their statistical and mathematical models using real world genomic database. We extend their experimental simulations for different adversarial assumptions and parameters.
We experimentally evaluated the solutions on the original attack model with different parameters for better understanding of the privacy and utility tradeoffs provided by these two methods. Also, the statistical analysis further elaborates the different aspects of the prior attack which leads to a better risk management for the participants in a beacon service.
The differentially private and lightweight solutions discussed here will make the attack much difficult to succeed while maintaining the fundamental motivation of beacon database network.
由于对用于存储全球基因组和临床数据的联合生态系统有巨大需求,全球基因组与健康联盟(GA4GH)创建了一个名为信标服务的国际网站,该网站允许研究人员事先了解特定数据集是否可用于其研究。这个简单的网络服务非常有用,因为它允许进行诸如目标染色体的某个特定位置是否具有特定核苷酸之类的查询。然而,将个人基因组数据越来越多地整合到临床实践和研究中引发了严重的隐私问题。尽管在培根网络中此类查询的答案是是或否,但正如Shringarpure和Bustamante最近的一项工作所表明的那样,这会导致严重的隐私隐患。在他们的攻击模型中,作者表明,通过有限数量的查询,可以确定任何数据集中是否存在某个人。
我们提出了两种轻量级算法(基于随机响应),在保留基因组信标服务中参与者隐私的同时捕捉其有效性。我们还通过使用真实世界的基因组数据库解释他们的一些统计和数学模型,阐述了攻击的优缺点。我们针对不同的对抗假设和参数扩展了他们的实验模拟。
我们在原始攻击模型上用不同参数对这些解决方案进行了实验评估,以便更好地理解这两种方法所提供的隐私和效用权衡。此外,统计分析进一步阐述了先前攻击的不同方面,这为信标服务中的参与者带来了更好的风险管理。
这里讨论的差分隐私和轻量级解决方案将使攻击难以成功,同时保持信标数据库网络的基本动机。