Sei Yuichi, Ohsuga Akihiko
Annu Int Conf IEEE Eng Med Biol Soc. 2017 Jul;2017:3884-3889. doi: 10.1109/EMBC.2017.8037705.
In recent years, the importance of privacy protection in genome-wide association studies (GWAS) has been increasing. GWAS focuses on identifying single-nucleotide polymorphisms (SNPs) associated with certain diseases such as cancer and diabetes, and Chi-squared testing can be used for this. However, recent studies reported that publishing the p-value or the corresponding chi-squared value of analyzed SNPs can cause privacy leakage. Several studies have been proposed for the anonymization of the chi-squared value with differential privacy, which is a de facto privacy metric in the cryptographic community. However, they can be applied to only small contingency tables; otherwise, they lose a lot of useful information. We propose novel anonymization methods: Rand-Chi and RandChiDist, and these methods are experimentally evaluated using real data sets.
近年来,全基因组关联研究(GWAS)中隐私保护的重要性日益增加。GWAS专注于识别与某些疾病(如癌症和糖尿病)相关的单核苷酸多态性(SNP),卡方检验可用于此。然而,最近的研究报告称,公布分析SNP的p值或相应的卡方值可能导致隐私泄露。已经提出了几项利用差分隐私对卡方值进行匿名化的研究,差分隐私是密码学界事实上的隐私度量。然而,它们只能应用于小列联表;否则,会丢失大量有用信息。我们提出了新的匿名化方法:Rand-Chi和RandChiDist,并使用真实数据集对这些方法进行了实验评估。