Kim Miran, Lauter Kristin
BMC Med Inform Decis Mak. 2015;15 Suppl 5(Suppl 5):S3. doi: 10.1186/1472-6947-15-S5-S3. Epub 2015 Dec 21.
The rapid development of genome sequencing technology allows researchers to access large genome datasets. However, outsourcing the data processing o the cloud poses high risks for personal privacy. The aim of this paper is to give a practical solution for this problem using homomorphic encryption. In our approach, all the computations can be performed in an untrusted cloud without requiring the decryption key or any interaction with the data owner, which preserves the privacy of genome data.
We present evaluation algorithms for secure computation of the minor allele frequencies and χ2 statistic in a genome-wide association studies setting. We also describe how to privately compute the Hamming distance and approximate Edit distance between encrypted DNA sequences. Finally, we compare performance details of using two practical homomorphic encryption schemes--the BGV scheme by Gentry, Halevi and Smart and the YASHE scheme by Bos, Lauter, Loftus and Naehrig.
The approach with the YASHE scheme analyzes data from 400 people within about 2 seconds and picks a variant associated with disease from 311 spots. For another task, using the BGV scheme, it took about 65 seconds to securely compute the approximate Edit distance for DNA sequences of size 5K and figure out the differences between them.
The performance numbers for BGV are better than YASHE when homomorphically evaluating deep circuits (like the Hamming distance algorithm or approximate Edit distance algorithm). On the other hand, it is more efficient to use the YASHE scheme for a low-degree computation, such as minor allele frequencies or χ2 test statistic in a case-control study.
基因组测序技术的快速发展使研究人员能够获取大型基因组数据集。然而,将数据处理外包给云端对个人隐私构成了高风险。本文旨在使用同态加密为这一问题提供一个切实可行的解决方案。在我们的方法中,所有计算都可以在不可信的云端进行,无需解密密钥或与数据所有者进行任何交互,从而保护了基因组数据的隐私。
我们提出了在全基因组关联研究环境中安全计算次要等位基因频率和χ2统计量的评估算法。我们还描述了如何私下计算加密DNA序列之间的汉明距离和近似编辑距离。最后,我们比较了使用两种实用同态加密方案——Gentry、Halevi和Smart提出的BGV方案以及Bos、Lauter、Loftus和Naehrig提出的YASHE方案的性能细节。
采用YASHE方案的方法在约2秒内分析了400人的数据,并从311个位点中挑选出一个与疾病相关的变体。对于另一项任务,使用BGV方案,安全计算大小为5K的DNA序列的近似编辑距离并找出它们之间的差异大约需要65秒。
在对深度电路进行同态评估(如汉明距离算法或近似编辑距离算法)时,BGV的性能指标优于YASHE。另一方面,在进行低阶计算时,如病例对照研究中的次要等位基因频率或χ2检验统计量,使用YASHE方案更有效。