Sousa João Sá, Lefebvre Cédric, Huang Zhicong, Raisaro Jean Louis, Aguilar-Melchor Carlos, Killijian Marc-Olivier, Hubaux Jean-Pierre
Laboratory for Communications and Applications - LCA 1, École Polytechnique Fédérale de Lausanne, Route Cantonale, Lausanne, 1015, Switzerland.
Laboratory for Analysis and Architecture of Systems - LAAS-CNRS, Université Toulouse, 7 Avenue du Colonel Roche, Toulouse, 31400, France.
BMC Med Genomics. 2017 Jul 26;10(Suppl 2):46. doi: 10.1186/s12920-017-0275-0.
Cloud computing is becoming the preferred solution for efficiently dealing with the increasing amount of genomic data. Yet, outsourcing storage and processing sensitive information, such as genomic data, comes with important concerns related to privacy and security. This calls for new sophisticated techniques that ensure data protection from untrusted cloud providers and that still enable researchers to obtain useful information.
We present a novel privacy-preserving algorithm for fully outsourcing the storage of large genomic data files to a public cloud and enabling researchers to efficiently search for variants of interest. In order to protect data and query confidentiality from possible leakage, our solution exploits optimal encoding for genomic variants and combines it with homomorphic encryption and private information retrieval. Our proposed algorithm is implemented in C++ and was evaluated on real data as part of the 2016 iDash Genome Privacy-Protection Challenge.
Results show that our solution outperforms the state-of-the-art solutions and enables researchers to search over millions of encrypted variants in a few seconds.
As opposed to prior beliefs that sophisticated privacy-enhancing technologies (PETs) are unpractical for real operational settings, our solution demonstrates that, in the case of genomic data, PETs are very efficient enablers.
云计算正成为有效处理日益增长的基因组数据量的首选解决方案。然而,将诸如基因组数据等敏感信息的存储和处理外包,会带来与隐私和安全相关的重要问题。这就需要新的复杂技术,既能确保数据免受不可信云提供商的侵害,又能让研究人员获取有用信息。
我们提出了一种新颖的隐私保护算法,用于将大型基因组数据文件的存储完全外包给公共云,并使研究人员能够高效地搜索感兴趣的变异。为了保护数据和查询机密性不被泄露,我们的解决方案利用基因组变异的最优编码,并将其与同态加密和私有信息检索相结合。我们提出的算法用C++实现,并作为2016年iDash基因组隐私保护挑战赛的一部分在真实数据上进行了评估。
结果表明,我们的解决方案优于现有解决方案,使研究人员能够在几秒钟内搜索数百万个加密变异。
与之前认为复杂的隐私增强技术(PET)在实际操作环境中不实用的观点相反,我们的解决方案表明,在基因组数据的情况下,PET是非常有效的促成因素。