Suppr超能文献

安全存储和查询大规模基因组数据。

Secure large-scale genome data storage and query.

机构信息

Heinz College, Carnegie Mellon University, United States.

Computer Science, University of Manitoba, Canada.

出版信息

Comput Methods Programs Biomed. 2018 Oct;165:129-137. doi: 10.1016/j.cmpb.2018.08.007. Epub 2018 Aug 16.

Abstract

BACKGROUND AND OBJECTIVE

Cloud computing plays a vital role in big data science with its scalable and cost-efficient architecture. Large-scale genome data storage and computations would benefit from using these latest cloud computing infrastructures, to save cost and speedup discoveries. However, due to the privacy and security concerns, data owners are often disinclined to put sensitive data in a public cloud environment without enforcing some protective measures. An ideal solution is to develop secure genome database that supports encrypted data deposition and query.

METHODS

Nevertheless, it is a challenging task to make such a system fast and scalable enough to handle real-world demands providing data security as well. In this paper, we propose a novel, secure mechanism to support secure count queries on an open source graph database (Neo4j) and evaluated the performance on a real-world dataset of around 735,317 Single Nucleotide Polymorphisms (SNPs). In particular, we propose a new tree indexing method that offers constant time complexity (proportion to the tree depth), which was the bottleneck of existing approaches.

RESULTS

The proposed method significantly improves the runtime of query execution compared to the existing techniques. It takes less than one minute to execute an arbitrary count query on a dataset of 212  GB, while the best-known algorithm takes around 7  min.

CONCLUSIONS

The outlined framework and experimental results show the applicability of utilizing graph database for securely storing large-scale genome data in untrusted environment. Furthermore, the crypto-system and security assumptions underlined are much suitable for such use cases which be generalized in future work.

摘要

背景与目的

云计算以其可扩展和经济高效的架构在大数据科学中发挥着至关重要的作用。利用这些最新的云计算基础设施进行大规模基因组数据存储和计算,可以节省成本并加速发现。然而,由于隐私和安全问题,数据所有者往往不愿意在没有采取一些保护措施的情况下将敏感数据放在公共云环境中。理想的解决方案是开发支持加密数据存储和查询的安全基因组数据库。

方法

然而,要使这样的系统足够快速和可扩展,以处理现实世界的需求并提供数据安全性,这是一项具有挑战性的任务。在本文中,我们提出了一种新颖的安全机制,用于在开源图数据库(Neo4j)上支持安全计数查询,并在大约 735317 个单核苷酸多态性(SNP)的真实数据集上评估了性能。特别是,我们提出了一种新的树索引方法,该方法提供了固定的时间复杂度(与树的深度成比例),这是现有方法的瓶颈。

结果

与现有技术相比,所提出的方法显著提高了查询执行的运行时。在 212GB 的数据集上执行任意计数查询不到一分钟,而最知名的算法大约需要 7 分钟。

结论

所概述的框架和实验结果表明,在不可信环境中利用图数据库安全存储大规模基因组数据是可行的。此外,所强调的加密系统和安全假设非常适合这种用例,可以在未来的工作中进行推广。

相似文献

1
Secure large-scale genome data storage and query.安全存储和查询大规模基因组数据。
Comput Methods Programs Biomed. 2018 Oct;165:129-137. doi: 10.1016/j.cmpb.2018.08.007. Epub 2018 Aug 16.
3
Secure count query on encrypted genomic data.加密基因组数据上的安全计数查询。
J Biomed Inform. 2018 May;81:41-52. doi: 10.1016/j.jbi.2018.03.003. Epub 2018 Mar 15.
4
Secure Similar Patients Query on Encrypted Genomic Data.对加密基因组数据进行安全的相似患者查询。
IEEE J Biomed Health Inform. 2019 Nov;23(6):2611-2618. doi: 10.1109/JBHI.2018.2881086. Epub 2018 Nov 13.
6
Efficient and secure outsourcing of genomic data storage.基因组数据存储的高效且安全的外包
BMC Med Genomics. 2017 Jul 26;10(Suppl 2):46. doi: 10.1186/s12920-017-0275-0.
7
Private and Efficient Query Processing on Outsourced Genomic Databases.外包基因组数据库上的私密且高效的查询处理
IEEE J Biomed Health Inform. 2017 Sep;21(5):1466-1472. doi: 10.1109/JBHI.2016.2625299. Epub 2016 Nov 4.
9
FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption.FORESEE:基于同态加密的全外包安全基因组研究
BMC Med Inform Decis Mak. 2015;15 Suppl 5(Suppl 5):S5. doi: 10.1186/1472-6947-15-S5-S5. Epub 2015 Dec 21.

引用本文的文献

本文引用的文献

1
Secure count query on encrypted genomic data.加密基因组数据上的安全计数查询。
J Biomed Inform. 2018 May;81:41-52. doi: 10.1016/j.jbi.2018.03.003. Epub 2018 Mar 15.
2
Privacy-preserving techniques of genomic data-a survey.基因组数据隐私保护技术综述。
Brief Bioinform. 2019 May 21;20(3):887-895. doi: 10.1093/bib/bbx139.
3
Identification of individuals by trait prediction using whole-genome sequencing data.基于全基因组测序数据的特征预测进行个体识别。
Proc Natl Acad Sci U S A. 2017 Sep 19;114(38):10166-10171. doi: 10.1073/pnas.1711125114. Epub 2017 Sep 5.
4
Private and Efficient Query Processing on Outsourced Genomic Databases.外包基因组数据库上的私密且高效的查询处理
IEEE J Biomed Health Inform. 2017 Sep;21(5):1466-1472. doi: 10.1109/JBHI.2016.2625299. Epub 2016 Nov 4.
6
Privacy in the Genomic Era.基因组时代的隐私问题。
ACM Comput Surv. 2015 Sep;48(1). doi: 10.1145/2767007.
7
Big Data: Astronomical or Genomical?大数据:天文学的还是基因组学的?
PLoS Biol. 2015 Jul 7;13(7):e1002195. doi: 10.1371/journal.pbio.1002195. eCollection 2015 Jul.
8
Modeling 3D facial shape from DNA.从DNA构建三维面部形状模型。
PLoS Genet. 2014 Mar 20;10(3):e1004224. doi: 10.1371/journal.pgen.1004224. eCollection 2014 Mar.
10
Identifying personal genomes by surname inference.姓氏推断识别个人基因组。
Science. 2013 Jan 18;339(6117):321-4. doi: 10.1126/science.1229566.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验