IEEE/ACM Trans Comput Biol Bioinform. 2018 Jul-Aug;15(4):1333-1343. doi: 10.1109/TCBB.2017.2709740.
Individuals (and their family members) share (partial) genomic data on public platforms. However, using special characteristics of genomic data, background knowledge that can be obtained from the Web, and family relationship between the individuals, it is possible to infer the hidden parts of shared (and unshared) genomes. Existing work in this field considers simple correlations in the genome (as well as Mendel's law and partial genomes of a victim and his family members). In this paper, we improve the existing work on inference attacks on genomic privacy. We mainly consider complex correlations in the genome by using an observable Markov model and recombination model between the haplotypes. We also utilize the phenotype information about the victims. We propose an efficient message passing algorithm to consider all aforementioned background information for the inference. We show that the proposed framework improves inference with significantly less information compared to existing work.
个人(及其家庭成员)在公共平台上共享(部分)基因组数据。然而,利用基因组数据的特殊特征、可以从网络上获得的背景知识以及个体之间的亲属关系,可以推断出共享(和未共享)基因组的隐藏部分。该领域的现有工作考虑了基因组中的简单相关性(以及孟德尔定律和受害者及其家庭成员的部分基因组)。在本文中,我们改进了现有关于基因组隐私推断攻击的工作。我们主要通过使用观察到的马尔可夫模型和单倍型之间的重组模型来考虑基因组中的复杂相关性。我们还利用了有关受害者的表型信息。我们提出了一种有效的消息传递算法,以考虑推断所涉及的所有上述背景信息。我们表明,与现有工作相比,所提出的框架可以用更少的信息进行更好的推断。