Intel Labs, Intel Corporation, Santa Clara, California, USA.
BMC Med Genomics. 2024 Nov 19;17(1):273. doi: 10.1186/s12920-024-02037-9.
Forensic analysis heavily relies on DNA analysis techniques, notably autosomal Single Nucleotide Polymorphisms (SNPs), to expedite the identification of unknown suspects through genomic database searches. However, the uniqueness of an individual's genome sequence designates it as Personal Identifiable Information (PII), subjecting it to stringent privacy regulations that can impede data access and analysis, as well as restrict the parties allowed to handle the data. Homomorphic Encryption (HE) emerges as a promising solution, enabling the execution of complex functions on encrypted data without the need for decryption. HE not only permits the processing of PII as soon as it is collected and encrypted, such as at a crime scene, but also expands the potential for data processing by multiple entities and artificial intelligence services.
This study introduces HE-based privacy-preserving methods for SNP DNA analysis, offering a means to compute kinship scores for a set of genome queries while meticulously preserving data privacy. We present three distinct approaches, including one unsupervised and two supervised methods, all of which demonstrated exceptional performance in the iDASH 2023 Track 1 competition.
Our HE-based methods can rapidly predict 400 kinship scores from an encrypted database containing 2000 entries within seconds, capitalizing on advanced technologies like Intel AVX vector extensions, Intel HEXL, and Microsoft SEAL HE libraries. Crucially, all three methods achieve remarkable accuracy levels (ranging from 96% to 100%), as evaluated by the auROC score metric, while maintaining robust 128-bit security. These findings underscore the transformative potential of HE in both safeguarding genomic data privacy and streamlining precise DNA analysis.
Results demonstrate that HE-based solutions can be computationally practical to protect genomic privacy during screening of candidate matches for further genealogy analysis in Forensic Genetic Genealogy (FGG).
法医分析严重依赖 DNA 分析技术,特别是常染色体单核苷酸多态性 (SNP),通过基因组数据库搜索加速对未知嫌疑人的识别。然而,个体基因组序列的独特性将其指定为个人可识别信息 (PII),使其受到严格的隐私法规的限制,这些法规可能会阻碍数据访问和分析,并限制允许处理数据的各方。同态加密 (HE) 应运而生,成为一种有前途的解决方案,能够在不进行解密的情况下对加密数据执行复杂的功能。HE 不仅允许在收集和加密数据(例如在犯罪现场)后立即处理 PII,还扩大了多个实体和人工智能服务进行数据处理的潜力。
本研究介绍了基于 HE 的 SNP DNA 分析隐私保护方法,提供了一种在精心保护数据隐私的同时计算一组基因组查询亲缘关系评分的方法。我们提出了三种不同的方法,包括一种无监督方法和两种有监督方法,它们在 iDASH 2023 赛道 1 竞赛中都表现出了出色的性能。
我们的基于 HE 的方法可以利用 Intel AVX 向量扩展、Intel HEXL 和 Microsoft SEAL HE 库等先进技术,在包含 2000 条记录的加密数据库中快速预测 400 个亲缘关系评分,速度之快令人瞩目。至关重要的是,所有三种方法都达到了令人瞩目的准确率水平(auROC 评分范围从 96%到 100%),同时保持了强大的 128 位安全性。这些发现突显了 HE 在保护基因组数据隐私和简化精确 DNA 分析方面的变革潜力。
结果表明,基于 HE 的解决方案在保护基因组隐私方面具有计算实用性,可以在法医遗传谱系学 (FGG) 中对候选匹配进行筛选,以进一步进行基因谱系分析。