• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基因组数据编辑距离的安全近似值。

Secure approximation of edit distance on genomic data.

作者信息

Aziz Md Momin Al, Alhadidi Dima, Mohammed Noman

机构信息

Department of Computer Science, University of Manitoba, Winnipeg, Canada.

Zayed University, Dubai, United Arab Emirates.

出版信息

BMC Med Genomics. 2017 Jul 26;10(Suppl 2):41. doi: 10.1186/s12920-017-0279-9.

DOI:10.1186/s12920-017-0279-9
PMID:28786362
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5547448/
Abstract

BACKGROUND

Edit distance is a well established metric to quantify how dissimilar two strings are by counting the minimum number of operations required to transform one string into the other. It is utilized in the domain of human genomic sequence similarity as it captures the requirements and leads to a better diagnosis of diseases. However, in addition to the computational complexity due to the large genomic sequence length, the privacy of these sequences are highly important. As these genomic sequences are unique and can identify an individual, these cannot be shared in a plaintext.

METHODS

In this paper, we propose two different approximation methods to securely compute the edit distance among genomic sequences. We use shingling, private set intersection methods, the banded alignment algorithm, and garbled circuits to implement these methods. We experimentally evaluate these methods and discuss both advantages and limitations.

RESULTS

Experimental results show that our first approximation method is fast and achieves similar accuracy compared to existing techniques. However, for longer genomic sequences, both the existing techniques and our proposed first method are unable to achieve a good accuracy. On the other hand, our second approximation method is able to achieve higher accuracy on such datasets. However, the second method is relatively slower than the first proposed method.

CONCLUSION

The proposed algorithms are generally accurate, time-efficient and can be applied individually and jointly as they have complimentary properties (runtime vs. accuracy) on different types of datasets.

摘要

背景

编辑距离是一种成熟的度量标准,通过计算将一个字符串转换为另一个字符串所需的最少操作数来量化两个字符串的差异程度。它被应用于人类基因组序列相似性领域,因为它能够满足需求并有助于更好地诊断疾病。然而,除了由于基因组序列长度大导致的计算复杂性外,这些序列的隐私性也非常重要。由于这些基因组序列是唯一的且能够识别个体,所以不能以明文形式共享。

方法

在本文中,我们提出了两种不同的近似方法来安全地计算基因组序列之间的编辑距离。我们使用分片、私有集合交集方法、带状比对算法和混淆电路来实现这些方法。我们通过实验对这些方法进行评估,并讨论其优缺点。

结果

实验结果表明,我们的第一种近似方法速度快,与现有技术相比具有相似的准确性。然而,对于较长的基因组序列,现有技术和我们提出的第一种方法都无法达到良好的准确性。另一方面,我们的第二种近似方法在此类数据集上能够达到更高的准确性。然而,第二种方法比第一种方法相对较慢。

结论

所提出的算法总体上准确、高效,并且由于它们在不同类型的数据集上具有互补特性(运行时间与准确性),可以单独应用或联合应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6817/5547448/22afa02176b6/12920_2017_279_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6817/5547448/c2b3dcc7786d/12920_2017_279_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6817/5547448/7f6ad2598a93/12920_2017_279_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6817/5547448/aa2329bac3c5/12920_2017_279_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6817/5547448/55db23024104/12920_2017_279_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6817/5547448/50cce3beb29c/12920_2017_279_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6817/5547448/0c16306d4a71/12920_2017_279_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6817/5547448/ed41c14b4a86/12920_2017_279_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6817/5547448/bc5f965107dc/12920_2017_279_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6817/5547448/937999bef2ad/12920_2017_279_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6817/5547448/22afa02176b6/12920_2017_279_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6817/5547448/c2b3dcc7786d/12920_2017_279_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6817/5547448/7f6ad2598a93/12920_2017_279_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6817/5547448/aa2329bac3c5/12920_2017_279_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6817/5547448/55db23024104/12920_2017_279_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6817/5547448/50cce3beb29c/12920_2017_279_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6817/5547448/0c16306d4a71/12920_2017_279_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6817/5547448/ed41c14b4a86/12920_2017_279_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6817/5547448/bc5f965107dc/12920_2017_279_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6817/5547448/937999bef2ad/12920_2017_279_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6817/5547448/22afa02176b6/12920_2017_279_Fig10_HTML.jpg

相似文献

1
Secure approximation of edit distance on genomic data.基因组数据编辑距离的安全近似值。
BMC Med Genomics. 2017 Jul 26;10(Suppl 2):41. doi: 10.1186/s12920-017-0279-9.
2
String correction using the Damerau-Levenshtein distance.使用 Damerau-Levenshtein 距离进行字符串校正。
BMC Bioinformatics. 2019 Jun 6;20(Suppl 11):277. doi: 10.1186/s12859-019-2819-0.
3
Secure Similar Patients Query on Encrypted Genomic Data.对加密基因组数据进行安全的相似患者查询。
IEEE J Biomed Health Inform. 2019 Nov;23(6):2611-2618. doi: 10.1109/JBHI.2018.2881086. Epub 2018 Nov 13.
4
Linear space string correction algorithm using the Damerau-Levenshtein distance.基于 Damerau-Levenshtein 距离的线性空间字符串校正算法。
BMC Bioinformatics. 2020 Dec 9;21(Suppl 1):4. doi: 10.1186/s12859-019-3184-8.
5
Application of latent semantic indexing to evaluate the similarity of sets of sequences without multiple alignments character-by-character.潜在语义索引在不逐字符进行多重比对的情况下评估序列集相似性中的应用。
Genet Mol Res. 2007 Oct 5;6(4):983-99.
6
Private queries on encrypted genomic data.关于加密基因组数据的私密查询
BMC Med Genomics. 2017 Jul 26;10(Suppl 2):45. doi: 10.1186/s12920-017-0276-z.
7
Methods of privacy-preserving genomic sequencing data alignments.隐私保护基因组测序数据比对方法。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab151.
8
A general edit distance between RNA structures.RNA结构之间的一般编辑距离。
J Comput Biol. 2002;9(2):371-88. doi: 10.1089/10665270252935511.
9
A measure of DNA sequence similarity by Fourier Transform with applications on hierarchical clustering.一种通过傅里叶变换衡量DNA序列相似性及其在层次聚类中的应用
J Theor Biol. 2014 Oct 21;359:18-28. doi: 10.1016/j.jtbi.2014.05.043. Epub 2014 Jun 6.
10
Efficient sequential and parallel algorithms for finding edit distance based motifs.用于查找基于编辑距离的基序的高效顺序和并行算法。
BMC Genomics. 2016 Aug 18;17 Suppl 4(Suppl 4):465. doi: 10.1186/s12864-016-2789-9.

引用本文的文献

1
Secure Comparisons of Single Nucleotide Polymorphisms Using Secure Multiparty Computation: Method Development.使用安全多方计算进行单核苷酸多态性的安全比较:方法开发
JMIR Bioinform Biotechnol. 2023 Jul 18;4:e44700. doi: 10.2196/44700.
2
The evolving privacy and security concerns for genomic data analysis and sharing as observed from the iDASH competition.从 iDASH 竞赛中观察到的基因组数据分析和共享的不断发展的隐私和安全问题。
J Am Med Inform Assoc. 2022 Nov 14;29(12):2182-2190. doi: 10.1093/jamia/ocac165.
3
Advances in the computational analysis of SARS-COV2 genome.

本文引用的文献

1
Private and Efficient Query Processing on Outsourced Genomic Databases.外包基因组数据库上的私密且高效的查询处理
IEEE J Biomed Health Inform. 2017 Sep;21(5):1466-1472. doi: 10.1109/JBHI.2016.2625299. Epub 2016 Nov 4.
2
Efficient privacy-preserving string search and an application in genomics.高效的隐私保护字符串搜索及其在基因组学中的应用。
Bioinformatics. 2016 Jun 1;32(11):1652-61. doi: 10.1093/bioinformatics/btw050. Epub 2016 Mar 2.
3
NIH's genomic data sharing policy: timing and tradeoffs.NIH 的基因组数据共享政策:时机与权衡。
严重急性呼吸综合征冠状病毒2(SARS-CoV-2)基因组的计算分析进展
Nonlinear Dyn. 2021;106(2):1525-1555. doi: 10.1007/s11071-021-06836-y. Epub 2021 Aug 27.
4
Computational analysis of the SARS-CoV-2 and other viruses based on the Kolmogorov's complexity and Shannon's information theories.基于柯尔莫哥洛夫复杂性和香农信息论对严重急性呼吸综合征冠状病毒2(SARS-CoV-2)及其他病毒进行的计算分析。
Nonlinear Dyn. 2020;101(3):1731-1750. doi: 10.1007/s11071-020-05771-8. Epub 2020 Jul 4.
5
A community effort to protect genomic data sharing, collaboration and outsourcing.一项旨在保护基因组数据共享、合作与外包的社区行动。
NPJ Genom Med. 2017 Oct 27;2:33. doi: 10.1038/s41525-017-0036-1. eCollection 2017.
6
Privacy-preserving techniques of genomic data-a survey.基因组数据隐私保护技术综述。
Brief Bioinform. 2019 May 21;20(3):887-895. doi: 10.1093/bib/bbx139.
Trends Genet. 2015 Feb;31(2):55-7. doi: 10.1016/j.tig.2014.12.006. Epub 2015 Jan 22.
4
Preserving personal autonomy in a genomic testing era.
Genet Med. 2013 May;15(5):408-9. doi: 10.1038/gim.2013.24.
5
Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays.使用高密度单核苷酸多态性(SNP)基因分型微阵列解析对高度复杂混合物贡献微量DNA的个体。
PLoS Genet. 2008 Aug 29;4(8):e1000167. doi: 10.1371/journal.pgen.1000167.
6
Fast optimal alignment.快速最优比对
Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):175-9. doi: 10.1093/nar/12.1part1.175.