Zhang Yihua, Blanton Marina, Almashaqbeh Ghada
BMC Med Inform Decis Mak. 2015;15 Suppl 5(Suppl 5):S4. doi: 10.1186/1472-6947-15-S5-S4. Epub 2015 Dec 21.
The rapid increase in the availability and volume of genomic data makes significant advances in biomedical research possible, but sharing of genomic data poses challenges due to the highly sensitive nature of such data. To address the challenges, a competition for secure distributed processing of genomic data was organized by the iDASH research center.
In this work we propose techniques for securing computation with real-life genomic data for minor allele frequency and chi-squared statistics computation, as well as distance computation between two genomic sequences, as specified by the iDASH competition tasks. We put forward novel optimizations, including a generalization of a version of mergesort, which might be of independent interest.
We provide implementation results of our techniques based on secret sharing that demonstrate practicality of the suggested protocols and also report on performance improvements due to our optimization techniques.
This work describes our techniques, findings, and experimental results developed and obtained as part of iDASH 2015 research competition to secure real-life genomic computations and shows feasibility of securely computing with genomic data in practice.
基因组数据的可获取性和数量迅速增加,使得生物医学研究取得重大进展成为可能,但由于此类数据具有高度敏感性,基因组数据的共享带来了挑战。为应对这些挑战,iDASH研究中心组织了一场基因组数据安全分布式处理竞赛。
在这项工作中,我们提出了一些技术,用于保护使用实际基因组数据进行的计算,以进行次要等位基因频率和卡方统计计算,以及如iDASH竞赛任务所规定的两个基因组序列之间的距离计算。我们提出了新颖的优化方法,包括对归并排序版本的推广,这可能具有独立的研究价值。
我们提供了基于秘密共享的技术实现结果,证明了所建议协议的实用性,并报告了由于我们的优化技术而带来的性能提升。
这项工作描述了我们作为iDASH 2015研究竞赛的一部分所开发和获得的技术、发现及实验结果,以确保实际基因组计算的安全性,并展示了在实践中对基因组数据进行安全计算的可行性。