• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于保护个人基因组隐私的序列混淆方法。

A Sequence Obfuscation Method for Protecting Personal Genomic Privacy.

作者信息

Wan Shibiao, Wang Jieqiong

机构信息

Center for Applied Bioinformatics, St. Jude Children's Research Hospital, Memphis, TN, United States.

Department of Radiology, University of Pennsylvania, Philadelphia, PA, United States.

出版信息

Front Genet. 2022 Apr 13;13:876686. doi: 10.3389/fgene.2022.876686. eCollection 2022.

DOI:10.3389/fgene.2022.876686
PMID:35495121
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9043694/
Abstract

With the technological advances in recent decades, determining whole genome sequencing of a person has become feasible and affordable. As a result, large-scale individual genomic sequences are produced and collected for genetic medical diagnoses and cancer drug discovery, which, however, simultaneously poses serious challenges to the protection of personal genomic privacy. It is highly urgent to develop methods which make the personal genomic data both utilizable and confidential. Existing genomic privacy-protection methods are either time-consuming for encryption or with low accuracy of data recovery. To tackle these problems, this paper proposes a sequence similarity-based obfuscation method, namely IterMegaBLAST, for fast and reliable protection of personal genomic privacy. Specifically, given a randomly selected sequence from a dataset of genomic sequences, we first use MegaBLAST to find its most similar sequence from the dataset. These two aligned sequences form a cluster, for which an obfuscated sequence was generated a DNA generalization lattice scheme. These procedures are iteratively performed until all of the sequences in the dataset are clustered and their obfuscated sequences are generated. Experimental results on benchmark datasets demonstrate that under the same degree of anonymity, IterMegaBLAST significantly outperforms existing state-of-the-art approaches in terms of both utility accuracy and time complexity.

摘要

随着近几十年来技术的进步,确定一个人的全基因组序列已变得可行且成本可承受。因此,为了进行遗传医学诊断和癌症药物研发,大量的个人基因组序列被生成并收集,然而,这同时也给个人基因组隐私保护带来了严峻挑战。开发既能使个人基因组数据可利用又能保密的方法迫在眉睫。现有的基因组隐私保护方法要么加密耗时,要么数据恢复准确率低。为了解决这些问题,本文提出了一种基于序列相似性的混淆方法,即IterMegaBLAST,用于快速可靠地保护个人基因组隐私。具体而言,给定从基因组序列数据集中随机选择的一个序列,我们首先使用MegaBLAST从数据集中找到与其最相似的序列。这两个比对后的序列形成一个簇,针对该簇采用DNA泛化格方案生成一个混淆序列。这些过程反复执行,直到数据集中所有序列都被聚类并生成它们的混淆序列。在基准数据集上的实验结果表明,在相同匿名程度下,IterMegaBLAST在效用准确率和时间复杂度方面均显著优于现有的最先进方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac39/9043694/353c31d36616/fgene-13-876686-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac39/9043694/a70fee0830b4/fgene-13-876686-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac39/9043694/fa63e804db15/fgene-13-876686-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac39/9043694/6fce389d404d/fgene-13-876686-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac39/9043694/d6684ae03081/fgene-13-876686-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac39/9043694/353c31d36616/fgene-13-876686-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac39/9043694/a70fee0830b4/fgene-13-876686-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac39/9043694/fa63e804db15/fgene-13-876686-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac39/9043694/6fce389d404d/fgene-13-876686-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac39/9043694/d6684ae03081/fgene-13-876686-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac39/9043694/353c31d36616/fgene-13-876686-g005.jpg

相似文献

1
A Sequence Obfuscation Method for Protecting Personal Genomic Privacy.一种用于保护个人基因组隐私的序列混淆方法。
Front Genet. 2022 Apr 13;13:876686. doi: 10.3389/fgene.2022.876686. eCollection 2022.
2
Protecting genomic sequence anonymity with generalization lattices.利用泛化格保护基因组序列匿名性。
Methods Inf Med. 2005;44(5):687-92.
3
Improvements on a privacy-protection algorithm for DNA sequences with generalization lattices.基于广义格的 DNA 序列隐私保护算法的改进。
Comput Methods Programs Biomed. 2012 Oct;108(1):1-9. doi: 10.1016/j.cmpb.2011.02.013. Epub 2011 Mar 22.
4
A novel on-line spatial-temporal k-anonymity method for location privacy protection from sequence rules-based inference attacks.一种基于序列规则推理攻击的位置隐私保护新型在线时空k匿名方法。
PLoS One. 2017 Aug 2;12(8):e0182232. doi: 10.1371/journal.pone.0182232. eCollection 2017.
5
K-Anonymity Privacy Protection Algorithm for Multi-Dimensional Data against Skewness and Similarity Attacks.多维数据的 K-匿名隐私保护算法,防止倾斜和相似性攻击。
Sensors (Basel). 2023 Jan 31;23(3):1554. doi: 10.3390/s23031554.
6
Privacy-preserving storage of sequenced genomic data.测序基因组数据的隐私保护存储。
BMC Genomics. 2021 Oct 2;22(1):712. doi: 10.1186/s12864-021-07996-2.
7
Methods of privacy-preserving genomic sequencing data alignments.隐私保护基因组测序数据比对方法。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab151.
8
Secure tumor classification by shallow neural network using homomorphic encryption.利用同态加密实现浅层神经网络的肿瘤分类安全。
BMC Genomics. 2022 Apr 9;23(1):284. doi: 10.1186/s12864-022-08469-w.
9
Secure approximation of edit distance on genomic data.基因组数据编辑距离的安全近似值。
BMC Med Genomics. 2017 Jul 26;10(Suppl 2):41. doi: 10.1186/s12920-017-0279-9.
10
Parallel and private generalized suffix tree construction and query on genomic data.在基因组数据上并行且私有的广义后缀树构建和查询。
BMC Genom Data. 2022 Jun 17;23(1):45. doi: 10.1186/s12863-022-01053-x.

本文引用的文献

1
Differential Privacy Protection Against Membership Inference Attack on Machine Learning for Genomic Data.针对基因组数据机器学习的成员推理攻击的差分隐私保护。
Pac Symp Biocomput. 2021;26:26-37.
2
openSNP--a crowdsourced web resource for personal genomics.openSNP--一个用于个人基因组学的众包网络资源。
PLoS One. 2014 Mar 19;9(3):e89204. doi: 10.1371/journal.pone.0089204. eCollection 2014.
3
Genomic medicine, health information technology, and patient care.基因组医学、健康信息技术与患者护理。
JAMA. 2013 Apr 10;309(14):1467-8. doi: 10.1001/jama.2013.1414.
4
Identifying personal genomes by surname inference.姓氏推断识别个人基因组。
Science. 2013 Jan 18;339(6117):321-4. doi: 10.1126/science.1229566.
5
Improvements on a privacy-protection algorithm for DNA sequences with generalization lattices.基于广义格的 DNA 序列隐私保护算法的改进。
Comput Methods Programs Biomed. 2012 Oct;108(1):1-9. doi: 10.1016/j.cmpb.2011.02.013. Epub 2011 Mar 22.
6
The disclosure of diagnosis codes can breach research participants' privacy.诊断编码的披露可能会侵犯研究参与者的隐私。
J Am Med Inform Assoc. 2010 May-Jun;17(3):322-7. doi: 10.1136/jamia.2009.002725.
7
A cryptographic approach to securely share and query genomic sequences.一种用于安全共享和查询基因组序列的加密方法。
IEEE Trans Inf Technol Biomed. 2008 Sep;12(5):606-17. doi: 10.1109/TITB.2007.908465.
8
Protecting genomic sequence anonymity with generalization lattices.利用泛化格保护基因组序列匿名性。
Methods Inf Med. 2005;44(5):687-92.
9
An evaluation of the current state of genomic data privacy protection technology and a roadmap for the future.基因组数据隐私保护技术的现状评估与未来路线图。
J Am Med Inform Assoc. 2005 Jan-Feb;12(1):28-34. doi: 10.1197/jamia.M1603. Epub 2004 Oct 18.
10
How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems.如何(不)在分布式网络中保护基因组数据隐私:利用踪迹重新识别来评估和设计匿名保护系统。
J Biomed Inform. 2004 Jun;37(3):179-92. doi: 10.1016/j.jbi.2004.04.005.