Suppr超能文献

一种用于保护基因组数据库中亲属关系的效用最大化与隐私保护方法。

A utility maximizing and privacy preserving approach for protecting kinship in genomic databases.

作者信息

Kale Gulce, Ayday Erman, Tastan Oznur

机构信息

Department of Computer Engineering, Bilkent University, Ankara, Turkey.

出版信息

Bioinformatics. 2018 Jan 15;34(2):181-189. doi: 10.1093/bioinformatics/btx568.

Abstract

MOTIVATION

Rapid and low cost sequencing of genomes enabled widespread use of genomic data in research studies and personalized customer applications, where genomic data is shared in public databases. Although the identities of the participants are anonymized in these databases, sensitive information about individuals can still be inferred. One such information is kinship.

RESULTS

We define two routes kinship privacy can leak and propose a technique to protect kinship privacy against these risks while maximizing the utility of shared data. The method involves systematic identification of minimal portions of genomic data to mask as new participants are added to the database. Choosing the proper positions to hide is cast as an optimization problem in which the number of positions to mask is minimized subject to privacy constraints that ensure the familial relationships are not revealed. We evaluate the proposed technique on real genomic data. Results indicate that concurrent sharing of data pertaining to a parent and an offspring results in high risks of kinship privacy, whereas the sharing data from further relatives together is often safer. We also show arrival order of family members have a high impact on the level of privacy risks and on the utility of sharing data.

AVAILABILITY AND IMPLEMENTATION

https://github.com/tastanlab/Kinship-Privacy.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

基因组的快速低成本测序使得基因组数据在研究和个性化客户应用中得到广泛应用,这些数据在公共数据库中共享。尽管这些数据库中参与者的身份已被匿名化,但有关个人的敏感信息仍可能被推断出来。亲属关系就是这样一种信息。

结果

我们定义了亲属关系隐私可能泄露的两条途径,并提出了一种技术来保护亲属关系隐私免受这些风险,同时最大化共享数据的效用。该方法包括在新参与者添加到数据库时系统地识别基因组数据的最小部分进行屏蔽。选择合适的隐藏位置被视为一个优化问题,即在确保不泄露家族关系的隐私约束下,尽量减少要屏蔽的位置数量。我们在真实的基因组数据上评估了所提出的技术。结果表明,同时共享与父母和后代相关的数据会导致亲属关系隐私泄露的高风险,而一起共享更远亲属的数据通常更安全。我们还表明家庭成员的到达顺序对隐私风险水平和共享数据的效用有很大影响。

可用性与实现

https://github.com/tastanlab/Kinship-Privacy。

补充信息

补充数据可在《生物信息学》在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验