基因组数据编辑距离的安全近似值。

Secure approximation of edit distance on genomic data.

作者信息

Aziz Md Momin Al, Alhadidi Dima, Mohammed Noman

机构信息

Department of Computer Science, University of Manitoba, Winnipeg, Canada.

Zayed University, Dubai, United Arab Emirates.

出版信息

BMC Med Genomics. 2017 Jul 26;10(Suppl 2):41. doi: 10.1186/s12920-017-0279-9.

DOI:10.1186/s12920-017-0279-9

PMID:28786362

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5547448/

Abstract

BACKGROUND

Edit distance is a well established metric to quantify how dissimilar two strings are by counting the minimum number of operations required to transform one string into the other. It is utilized in the domain of human genomic sequence similarity as it captures the requirements and leads to a better diagnosis of diseases. However, in addition to the computational complexity due to the large genomic sequence length, the privacy of these sequences are highly important. As these genomic sequences are unique and can identify an individual, these cannot be shared in a plaintext.

METHODS

In this paper, we propose two different approximation methods to securely compute the edit distance among genomic sequences. We use shingling, private set intersection methods, the banded alignment algorithm, and garbled circuits to implement these methods. We experimentally evaluate these methods and discuss both advantages and limitations.

RESULTS

Experimental results show that our first approximation method is fast and achieves similar accuracy compared to existing techniques. However, for longer genomic sequences, both the existing techniques and our proposed first method are unable to achieve a good accuracy. On the other hand, our second approximation method is able to achieve higher accuracy on such datasets. However, the second method is relatively slower than the first proposed method.

CONCLUSION

The proposed algorithms are generally accurate, time-efficient and can be applied individually and jointly as they have complimentary properties (runtime vs. accuracy) on different types of datasets.

摘要

背景

编辑距离是一种成熟的度量标准，通过计算将一个字符串转换为另一个字符串所需的最少操作数来量化两个字符串的差异程度。它被应用于人类基因组序列相似性领域，因为它能够满足需求并有助于更好地诊断疾病。然而，除了由于基因组序列长度大导致的计算复杂性外，这些序列的隐私性也非常重要。由于这些基因组序列是唯一的且能够识别个体，所以不能以明文形式共享。

方法

在本文中，我们提出了两种不同的近似方法来安全地计算基因组序列之间的编辑距离。我们使用分片、私有集合交集方法、带状比对算法和混淆电路来实现这些方法。我们通过实验对这些方法进行评估，并讨论其优缺点。

结果

实验结果表明，我们的第一种近似方法速度快，与现有技术相比具有相似的准确性。然而，对于较长的基因组序列，现有技术和我们提出的第一种方法都无法达到良好的准确性。另一方面，我们的第二种近似方法在此类数据集上能够达到更高的准确性。然而，第二种方法比第一种方法相对较慢。

结论

所提出的算法总体上准确、高效，并且由于它们在不同类型的数据集上具有互补特性（运行时间与准确性），可以单独应用或联合应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6817/5547448/c2b3dcc7786d/12920_2017_279_Fig1_HTML.jpg

相似文献

Secure approximation of edit distance on genomic data.基因组数据编辑距离的安全近似值。

BMC Med Genomics. 2017 Jul 26;10(Suppl 2):41. doi: 10.1186/s12920-017-0279-9.

String correction using the Damerau-Levenshtein distance.使用 Damerau-Levenshtein 距离进行字符串校正。

BMC Bioinformatics. 2019 Jun 6;20(Suppl 11):277. doi: 10.1186/s12859-019-2819-0.

IEEE J Biomed Health Inform. 2019 Nov;23(6):2611-2618. doi: 10.1109/JBHI.2018.2881086. Epub 2018 Nov 13.

Linear space string correction algorithm using the Damerau-Levenshtein distance.基于 Damerau-Levenshtein 距离的线性空间字符串校正算法。

BMC Bioinformatics. 2020 Dec 9;21(Suppl 1):4. doi: 10.1186/s12859-019-3184-8.

Application of latent semantic indexing to evaluate the similarity of sets of sequences without multiple alignments character-by-character.潜在语义索引在不逐字符进行多重比对的情况下评估序列集相似性中的应用。

Genet Mol Res. 2007 Oct 5;6(4):983-99.

Private queries on encrypted genomic data.关于加密基因组数据的私密查询

BMC Med Genomics. 2017 Jul 26;10(Suppl 2):45. doi: 10.1186/s12920-017-0276-z.

Methods of privacy-preserving genomic sequencing data alignments.隐私保护基因组测序数据比对方法。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab151.

A general edit distance between RNA structures.RNA结构之间的一般编辑距离。

J Comput Biol. 2002;9(2):371-88. doi: 10.1089/10665270252935511.

A measure of DNA sequence similarity by Fourier Transform with applications on hierarchical clustering.一种通过傅里叶变换衡量DNA序列相似性及其在层次聚类中的应用

J Theor Biol. 2014 Oct 21;359:18-28. doi: 10.1016/j.jtbi.2014.05.043. Epub 2014 Jun 6.

Efficient sequential and parallel algorithms for finding edit distance based motifs.用于查找基于编辑距离的基序的高效顺序和并行算法。

BMC Genomics. 2016 Aug 18;17 Suppl 4(Suppl 4):465. doi: 10.1186/s12864-016-2789-9.

引用本文的文献

Secure Comparisons of Single Nucleotide Polymorphisms Using Secure Multiparty Computation: Method Development.使用安全多方计算进行单核苷酸多态性的安全比较：方法开发

JMIR Bioinform Biotechnol. 2023 Jul 18;4:e44700. doi: 10.2196/44700.

The evolving privacy and security concerns for genomic data analysis and sharing as observed from the iDASH competition.从 iDASH 竞赛中观察到的基因组数据分析和共享的不断发展的隐私和安全问题。

J Am Med Inform Assoc. 2022 Nov 14;29(12):2182-2190. doi: 10.1093/jamia/ocac165.

Advances in the computational analysis of SARS-COV2 genome.严重急性呼吸综合征冠状病毒2（SARS-CoV-2）基因组的计算分析进展

Nonlinear Dyn. 2021;106(2):1525-1555. doi: 10.1007/s11071-021-06836-y. Epub 2021 Aug 27.

Computational analysis of the SARS-CoV-2 and other viruses based on the Kolmogorov's complexity and Shannon's information theories.基于柯尔莫哥洛夫复杂性和香农信息论对严重急性呼吸综合征冠状病毒2（SARS-CoV-2）及其他病毒进行的计算分析。

Nonlinear Dyn. 2020;101(3):1731-1750. doi: 10.1007/s11071-020-05771-8. Epub 2020 Jul 4.

A community effort to protect genomic data sharing, collaboration and outsourcing.一项旨在保护基因组数据共享、合作与外包的社区行动。

NPJ Genom Med. 2017 Oct 27;2:33. doi: 10.1038/s41525-017-0036-1. eCollection 2017.

Privacy-preserving techniques of genomic data-a survey.基因组数据隐私保护技术综述。

Brief Bioinform. 2019 May 21;20(3):887-895. doi: 10.1093/bib/bbx139.

本文引用的文献

Private and Efficient Query Processing on Outsourced Genomic Databases.外包基因组数据库上的私密且高效的查询处理

IEEE J Biomed Health Inform. 2017 Sep;21(5):1466-1472. doi: 10.1109/JBHI.2016.2625299. Epub 2016 Nov 4.

Efficient privacy-preserving string search and an application in genomics.高效的隐私保护字符串搜索及其在基因组学中的应用。

Bioinformatics. 2016 Jun 1;32(11):1652-61. doi: 10.1093/bioinformatics/btw050. Epub 2016 Mar 2.

NIH's genomic data sharing policy: timing and tradeoffs.NIH 的基因组数据共享政策：时机与权衡。

Trends Genet. 2015 Feb;31(2):55-7. doi: 10.1016/j.tig.2014.12.006. Epub 2015 Jan 22.

Preserving personal autonomy in a genomic testing era.

Genet Med. 2013 May;15(5):408-9. doi: 10.1038/gim.2013.24.

Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays.使用高密度单核苷酸多态性（SNP）基因分型微阵列解析对高度复杂混合物贡献微量DNA的个体。

PLoS Genet. 2008 Aug 29;4(8):e1000167. doi: 10.1371/journal.pgen.1000167.

Fast optimal alignment.快速最优比对

Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):175-9. doi: 10.1093/nar/12.1part1.175.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基因组数据编辑距离的安全近似值。

Secure approximation of edit distance on genomic data.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献