准确过滤原始基因组数据中的隐私敏感信息。

Accurate filtering of privacy-sensitive information in raw genomic data.

机构信息

SnT - Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg, Luxembourg.

LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Portugal.

出版信息

J Biomed Inform. 2018 Jun;82:1-12. doi: 10.1016/j.jbi.2018.04.006. Epub 2018 Apr 13.

DOI:10.1016/j.jbi.2018.04.006

PMID:29660494

Abstract

Sequencing thousands of human genomes has enabled breakthroughs in many areas, among them precision medicine, the study of rare diseases, and forensics. However, mass collection of such sensitive data entails enormous risks if not protected to the highest standards. In this article, we follow the position and argue that post-alignment privacy is not enough and that data should be automatically protected as early as possible in the genomics workflow, ideally immediately after the data is produced. We show that a previous approach for filtering short reads cannot extend to long reads and present a novel filtering approach that classifies raw genomic data (i.e., whose location and content is not yet determined) into privacy-sensitive (i.e., more affected by a successful privacy attack) and non-privacy-sensitive information. Such a classification allows the fine-grained and automated adjustment of protective measures to mitigate the possible consequences of exposure, in particular when relying on public clouds. We present the first filter that can be indistinctly applied to reads of any length, i.e., making it usable with any recent or future sequencing technologies. The filter is accurate, in the sense that it detects all known sensitive nucleotides except those located in highly variable regions (less than 10 nucleotides remain undetected per genome instead of 100,000 in previous works). It has far less false positives than previously known methods (10% instead of 60%) and can detect sensitive nucleotides despite sequencing errors (86% detected instead of 56% with 2% of mutations). Finally, practical experiments demonstrate high performance, both in terms of throughput and memory consumption.

摘要

对数千个人类基因组进行测序，在精准医疗、罕见病研究和法医学等领域取得了突破。然而，如果不以最高标准加以保护，大规模收集这些敏感数据将带来巨大风险。本文中，我们持这一立场并认为，在对齐后进行隐私保护是不够的，数据应在基因组学工作流程中尽早自动受到保护，理想情况下是在数据生成后立即进行保护。我们表明，先前用于过滤短读段的方法无法扩展到长读段，并提出了一种新的过滤方法，即将原始基因组数据（即位置和内容尚未确定的数据）分类为隐私敏感（即更容易受到成功的隐私攻击影响）和非隐私敏感信息。这种分类允许对保护措施进行细粒度和自动化的调整，以减轻暴露的可能后果，特别是在依赖公共云的情况下。我们提出了第一个可用于过滤任何长度读段的过滤器，也就是说，它可以与任何最新或未来的测序技术一起使用。该过滤器是准确的，因为它可以检测到所有已知的敏感核苷酸，除了那些位于高度变异区域的核苷酸（每个基因组中检测到的未检测到的核苷酸少于 10 个，而不是以前的工作中 100000 个）。它比以前已知的方法具有更少的假阳性（10%而不是 60%），并且即使存在测序错误也可以检测到敏感核苷酸（检测到 86%，而以前的方法在 2%的突变时只能检测到 56%）。最后，实际实验表明，在吞吐量和内存消耗方面都具有很高的性能。

相似文献

Accurate filtering of privacy-sensitive information in raw genomic data.准确过滤原始基因组数据中的隐私敏感信息。

J Biomed Inform. 2018 Jun;82:1-12. doi: 10.1016/j.jbi.2018.04.006. Epub 2018 Apr 13.

DNA-SeAl: Sensitivity Levels to Optimize the Performance of Privacy-Preserving DNA Alignment.DNA-SeAl：优化隐私保护 DNA 比对性能的灵敏度水平。

IEEE J Biomed Health Inform. 2020 Mar;24(3):907-915. doi: 10.1109/JBHI.2019.2914952. Epub 2019 Jun 28.

Secure count query on encrypted genomic data.加密基因组数据上的安全计数查询。

J Biomed Inform. 2018 May;81:41-52. doi: 10.1016/j.jbi.2018.03.003. Epub 2018 Mar 15.

Privacy preserving processing of genomic data: A survey.基因组数据的隐私保护处理：一项综述。

J Biomed Inform. 2015 Aug;56:103-11. doi: 10.1016/j.jbi.2015.05.022. Epub 2015 Jun 6.

Are privacy-enhancing technologies for genomic data ready for the clinic? A survey of medical experts of the Swiss HIV Cohort Study.用于基因组数据的隐私增强技术是否已准备好应用于临床？对瑞士艾滋病毒队列研究的医学专家的调查。

J Biomed Inform. 2018 Mar;79:1-6. doi: 10.1016/j.jbi.2017.12.013. Epub 2018 Jan 10.

GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies.GRIM-Filter：使用内存处理技术在 DNA 读取映射中快速进行种子位置过滤。

BMC Genomics. 2018 May 9;19(Suppl 2):89. doi: 10.1186/s12864-018-4460-0.

Data Sanitization to Reduce Private Information Leakage from Functional Genomics.数据清洗以减少功能基因组学中的私人信息泄露。

Cell. 2020 Nov 12;183(4):905-917.e16. doi: 10.1016/j.cell.2020.09.036.

Fast and accurate mapping of Complete Genomics reads.完整基因组测序读数的快速准确映射。

Methods. 2015 Jun;79-80:3-10. doi: 10.1016/j.ymeth.2014.10.012. Epub 2014 Oct 22.

MedCo: Enabling Secure and Privacy-Preserving Exploration of Distributed Clinical and Genomic Data.MedCo：实现分布式临床和基因组数据的安全和隐私保护探索。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1328-1341. doi: 10.1109/TCBB.2018.2854776. Epub 2018 Jul 13.

PriLive: privacy-preserving real-time filtering for next-generation sequencing.PriLive：用于下一代测序的隐私保护实时过滤。

Bioinformatics. 2018 Jul 15;34(14):2376-2383. doi: 10.1093/bioinformatics/bty128.

引用本文的文献

Morton Filter-Based Security Mechanism for Healthcare System in Cloud Computing.基于莫顿滤波器的云计算医疗系统安全机制

Healthcare (Basel). 2021 Nov 15;9(11):1551. doi: 10.3390/healthcare9111551.

Privacy-preserving storage of sequenced genomic data.测序基因组数据的隐私保护存储。

BMC Genomics. 2021 Oct 2;22(1):712. doi: 10.1186/s12864-021-07996-2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

准确过滤原始基因组数据中的隐私敏感信息。

Accurate filtering of privacy-sensitive information in raw genomic data.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献