Kale Gulce, Ayday Erman, Tastan Oznur
Department of Computer Engineering, Bilkent University, Ankara, Turkey.
Bioinformatics. 2018 Jan 15;34(2):181-189. doi: 10.1093/bioinformatics/btx568.
Rapid and low cost sequencing of genomes enabled widespread use of genomic data in research studies and personalized customer applications, where genomic data is shared in public databases. Although the identities of the participants are anonymized in these databases, sensitive information about individuals can still be inferred. One such information is kinship.
We define two routes kinship privacy can leak and propose a technique to protect kinship privacy against these risks while maximizing the utility of shared data. The method involves systematic identification of minimal portions of genomic data to mask as new participants are added to the database. Choosing the proper positions to hide is cast as an optimization problem in which the number of positions to mask is minimized subject to privacy constraints that ensure the familial relationships are not revealed. We evaluate the proposed technique on real genomic data. Results indicate that concurrent sharing of data pertaining to a parent and an offspring results in high risks of kinship privacy, whereas the sharing data from further relatives together is often safer. We also show arrival order of family members have a high impact on the level of privacy risks and on the utility of sharing data.
https://github.com/tastanlab/Kinship-Privacy.
Supplementary data are available at Bioinformatics online.
基因组的快速低成本测序使得基因组数据在研究和个性化客户应用中得到广泛应用,这些数据在公共数据库中共享。尽管这些数据库中参与者的身份已被匿名化,但有关个人的敏感信息仍可能被推断出来。亲属关系就是这样一种信息。
我们定义了亲属关系隐私可能泄露的两条途径,并提出了一种技术来保护亲属关系隐私免受这些风险,同时最大化共享数据的效用。该方法包括在新参与者添加到数据库时系统地识别基因组数据的最小部分进行屏蔽。选择合适的隐藏位置被视为一个优化问题,即在确保不泄露家族关系的隐私约束下,尽量减少要屏蔽的位置数量。我们在真实的基因组数据上评估了所提出的技术。结果表明,同时共享与父母和后代相关的数据会导致亲属关系隐私泄露的高风险,而一起共享更远亲属的数据通常更安全。我们还表明家庭成员的到达顺序对隐私风险水平和共享数据的效用有很大影响。
https://github.com/tastanlab/Kinship-Privacy。
补充数据可在《生物信息学》在线获取。