Suppr超能文献

基因组数据库的稳健指纹识别。

Robust fingerprinting of genomic databases.

机构信息

Department of Electrical, Computer, and System Engineering, Case Western Reserve University, Cleveland, OH 44106, USA.

Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, USA.

出版信息

Bioinformatics. 2022 Jun 24;38(Suppl 1):i143-i152. doi: 10.1093/bioinformatics/btac243.

Abstract

MOTIVATION

Database fingerprinting has been widely used to discourage unauthorized redistribution of data by providing means to identify the source of data leakages. However, there is no fingerprinting scheme aiming at achieving liability guarantees when sharing genomic databases. Thus, we are motivated to fill in this gap by devising a vanilla fingerprinting scheme specifically for genomic databases. Moreover, since malicious genomic database recipients may compromise the embedded fingerprint (distort the steganographic marks, i.e. the embedded fingerprint bit-string) by launching effective correlation attacks, which leverage the intrinsic correlations among genomic data (e.g. Mendel's law and linkage disequilibrium), we also augment the vanilla scheme by developing mitigation techniques to achieve robust fingerprinting of genomic databases against correlation attacks.

RESULTS

Via experiments using a real-world genomic database, we first show that correlation attacks against fingerprinting schemes for genomic databases are very powerful. In particular, the correlation attacks can distort more than half of the fingerprint bits by causing a small utility loss (e.g. database accuracy and consistency of SNP-phenotype associations measured via P-values). Next, we experimentally show that the correlation attacks can be effectively mitigated by our proposed mitigation techniques. We validate that the attacker can hardly compromise a large portion of the fingerprint bits even if it pays a higher cost in terms of degradation of the database utility. For example, with around 24% loss in accuracy and 20% loss in the consistency of SNP-phenotype associations, the attacker can only distort about 30% fingerprint bits, which is insufficient for it to avoid being accused. We also show that the proposed mitigation techniques also preserve the utility of the shared genomic databases, e.g. the mitigation techniques only lead to around 3% loss in accuracy.

AVAILABILITY AND IMPLEMENTATION

https://github.com/xiutianxi/robust-genomic-fp-github.

摘要

动机

数据库指纹识别已被广泛用于通过提供识别数据泄露源的手段来阻止数据的未经授权分发。然而,目前还没有针对共享基因组数据库实现责任保证的指纹识别方案。因此,我们通过设计专门针对基因组数据库的普通指纹识别方案来填补这一空白。此外,由于恶意基因组数据库接收者可能通过发起有效的相关攻击来破坏嵌入式指纹(即嵌入的指纹位串),这些攻击利用了基因组数据(例如孟德尔定律和连锁不平衡)之间的内在相关性,因此我们还通过开发缓解技术来增强普通方案,以实现针对相关攻击的稳健基因组数据库指纹识别。

结果

通过使用真实世界的基因组数据库进行实验,我们首先表明针对基因组数据库指纹识别方案的相关攻击非常强大。特别是,相关攻击可以通过造成小的效用损失(例如通过 P 值测量的数据库准确性和 SNP-表型关联的一致性)来扭曲超过一半的指纹位。接下来,我们通过实验表明我们提出的缓解技术可以有效地减轻相关攻击。我们验证了即使攻击者在数据库效用的降级方面付出更高的代价,它也很难破坏大量的指纹位。例如,在准确性损失约 24%和 SNP-表型关联的一致性损失约 20%的情况下,攻击者只能扭曲大约 30%的指纹位,这不足以使其避免被指控。我们还表明,所提出的缓解技术还保留了共享基因组数据库的效用,例如缓解技术仅导致大约 3%的准确性损失。

可用性和实现

https://github.com/xiutianxi/robust-genomic-fp-github。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fed/9236581/706dd086b7b2/btac243f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验