分子级相似性搜索将计算技术应用于 DNA 数据存储。

Molecular-level similarity search brings computing to DNA data storage.

机构信息

Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA.

Microsoft Research, Redmond, WA, USA.

出版信息

Nat Commun. 2021 Aug 6;12(1):4764. doi: 10.1038/s41467-021-24991-z.

DOI:10.1038/s41467-021-24991-z

PMID:34362913

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8346626/

Abstract

As global demand for digital storage capacity grows, storage technologies based on synthetic DNA have emerged as a dense and durable alternative to traditional media. Existing approaches leverage robust error correcting codes and precise molecular mechanisms to reliably retrieve specific files from large databases. Typically, files are retrieved using a pre-specified key, analogous to a filename. However, these approaches lack the ability to perform more complex computations over the stored data, such as similarity search: e.g., finding images that look similar to an image of interest without prior knowledge of their file names. Here we demonstrate a technique for executing similarity search over a DNA-based database of 1.6 million images. Queries are implemented as hybridization probes, and a key step in our approach was to learn an image-to-sequence encoding ensuring that queries preferentially bind to targets representing visually similar images. Experimental results show that our molecular implementation performs comparably to state-of-the-art in silico algorithms for similarity search.

摘要

随着全球对数字存储容量的需求不断增长，基于合成 DNA 的存储技术作为传统媒体的一种密集且持久的替代品而出现。现有的方法利用强大的纠错码和精确的分子机制，从大型数据库中可靠地检索特定文件。通常，使用预定义的密钥（类似于文件名）来检索文件。但是，这些方法缺乏对存储数据执行更复杂计算的能力，例如相似性搜索：例如，在没有事先了解文件名的情况下找到与感兴趣的图像相似的图像。在这里，我们展示了一种在基于 DNA 的 160 万张图像数据库上执行相似性搜索的技术。查询是作为杂交探针实现的，我们的方法中的一个关键步骤是学习图像到序列的编码，以确保查询优先与代表视觉相似图像的目标结合。实验结果表明，我们的分子实现与最先进的基于计算机的相似性搜索算法相当。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cb1/8346626/0b1c1b97acda/41467_2021_24991_Fig1_HTML.jpg

相似文献

Molecular-level similarity search brings computing to DNA data storage.分子级相似性搜索将计算技术应用于 DNA 数据存储。

Nat Commun. 2021 Aug 6;12(1):4764. doi: 10.1038/s41467-021-24991-z.

SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters.SS-Wrapper：用于在Linux集群上进行相似性搜索的一组包装应用程序。

BMC Bioinformatics. 2004 Oct 28;5:171. doi: 10.1186/1471-2105-5-171.

Retrieval efficiency of DNA-based databases of digital signals.基于数字信号的 DNA 数据库的检索效率。

IEEE Trans Nanobioscience. 2009 Sep;8(3):259-70. doi: 10.1109/TNB.2009.2026371. Epub 2009 Jul 10.

Techniques for optimization of queries on integrated biological resources.整合生物资源查询的优化技术。

J Bioinform Comput Biol. 2004 Jun;2(2):375-411. doi: 10.1142/s0219720004000648.

A novel biomedical image indexing and retrieval system via deep preference learning.一种基于深度偏好学习的新型生物医学图像索引和检索系统。

Comput Methods Programs Biomed. 2018 May;158:53-69. doi: 10.1016/j.cmpb.2018.02.003. Epub 2018 Feb 6.

Accurate Approach Towards Efficiency of Searching Agents in Digital Libraries Using Keywords.利用关键词提高数字图书馆中搜索代理的效率的精确方法。

J Med Syst. 2019 May 1;43(6):164. doi: 10.1007/s10916-019-1294-5.

A boosting framework for visuality-preserving distance metric learning and its application to medical image retrieval.一种保持视觉保真度的距离度量学习的提升框架及其在医学图像检索中的应用。

IEEE Trans Pattern Anal Mach Intell. 2010 Jan;32(1):30-44. doi: 10.1109/TPAMI.2008.273.

Nucleic Acid Databases and Molecular-Scale Computing.核酸数据库和分子尺度计算。

ACS Nano. 2019 Jun 25;13(6):6256-6268. doi: 10.1021/acsnano.9b02562. Epub 2019 May 24.

IEEE Trans Med Imaging. 2004 Oct;23(10):1233-44. doi: 10.1109/TMI.2004.834601.

Recent Hits Acquired by BLAST (ReHAB): a tool to identify new hits in sequence similarity searches.通过BLAST获取的近期命中结果（ReHAB）：一种在序列相似性搜索中识别新命中结果的工具。

BMC Bioinformatics. 2005 Feb 8;6:23. doi: 10.1186/1471-2105-6-23.

引用本文的文献

Random access and semantic search in DNA data storage enabled by Cas9 and machine-guided design.由Cas9和机器引导设计实现的DNA数据存储中的随机访问和语义搜索。

Nat Commun. 2025 Jul 10;16(1):6388. doi: 10.1038/s41467-025-61264-5.

DNA storage: The future direction for medical cold data storage.DNA存储：医学冷数据存储的未来方向。

Synth Syst Biotechnol. 2025 Mar 14;10(2):677-695. doi: 10.1016/j.synbio.2025.03.006. eCollection 2025 Jun.

Artificial molecular communication network based on DNA nanostructures recognition.基于DNA纳米结构识别的人工分子通信网络

Nat Commun. 2025 Jan 2;16(1):244. doi: 10.1038/s41467-024-55527-w.

Explorer: efficient DNA coding by De Bruijn graph toward arbitrary local and global biochemical constraints.探索者：基于 De Bruijn 图实现高效的 DNA 编码，满足任意局部和全局生化约束。

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae363.

Scalable design of orthogonal DNA barcode libraries.可扩展的正交 DNA 条码文库设计。

Nat Comput Sci. 2024 Jun;4(6):423-428. doi: 10.1038/s43588-024-00646-z. Epub 2024 Jun 7.

DNA Bloom Filter enables anti-contamination and file version control for DNA-based data storage.DNA Bloom Filter 可实现基于 DNA 的数据存储的防污染和文件版本控制。

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae125.

RepairNatrix: a Snakemake workflow for processing DNA sequencing data for DNA storage.RepairNatrix：用于处理DNA存储的DNA测序数据的Snakemake工作流程。

Bioinform Adv. 2023 Aug 26;3(1):vbad117. doi: 10.1093/bioadv/vbad117. eCollection 2023.

CRISPR-powered quantitative keyword search engine in DNA data storage.基于 CRISPR 的 DNA 数据存储中的定量关键词搜索引擎。

Nat Commun. 2024 Mar 15;15(1):2376. doi: 10.1038/s41467-024-46767-x.

DNA as a universal chemical substrate for computing and data storage.DNA 作为通用的化学计算和数据存储基质。

Nat Rev Chem. 2024 Mar;8(3):179-194. doi: 10.1038/s41570-024-00576-4. Epub 2024 Feb 9.

NeuralBeds: Neural embeddings for efficient DNA data compression and optimized similarity search.NeuralBeds：用于高效DNA数据压缩和优化相似性搜索的神经嵌入

Comput Struct Biotechnol J. 2024 Jan 15;23:732-741. doi: 10.1016/j.csbj.2023.12.046. eCollection 2024 Dec.

本文引用的文献

Cancer diagnosis with DNA molecular computation.用 DNA 分子计算进行癌症诊断。

Nat Nanotechnol. 2020 Aug;15(8):709-715. doi: 10.1038/s41565-020-0699-0. Epub 2020 May 25.

Molecular digital data storage using DNA.利用 DNA 进行分子数字数据存储。

Nat Rev Genet. 2019 Aug;20(8):456-466. doi: 10.1038/s41576-019-0125-3.

Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs.使用分层可导航小世界图进行高效且鲁棒的近似最近邻搜索

IEEE Trans Pattern Anal Mach Intell. 2020 Apr;42(4):824-836. doi: 10.1109/TPAMI.2018.2889473. Epub 2018 Dec 28.

A molecular multi-gene classifier for disease diagnostics.一种用于疾病诊断的分子多基因分类器。

Nat Chem. 2018 Jul;10(7):746-754. doi: 10.1038/s41557-018-0056-1. Epub 2018 Apr 30.

Random access in large-scale DNA data storage.大规模 DNA 数据存储中的随机访问。

Nat Biotechnol. 2018 Mar;36(3):242-248. doi: 10.1038/nbt.4079. Epub 2018 Feb 19.

Emerging Droplet Microfluidics.新兴液滴微流控技术。

Chem Rev. 2017 Jun 28;117(12):7964-8040. doi: 10.1021/acs.chemrev.6b00848. Epub 2017 May 24.

DNA Fountain enables a robust and efficient storage architecture.DNA 喷泉实现了稳健且高效的存储架构。

Science. 2017 Mar 3;355(6328):950-954. doi: 10.1126/science.aaj2038.

A Rewritable, Random-Access DNA-Based Storage System.一种基于DNA的可重写随机存取存储系统。

Sci Rep. 2015 Sep 18;5:14138. doi: 10.1038/srep14138.

Robust chemical preservation of digital information on DNA in silica with error-correcting codes.利用纠错码在硅基片上对 DNA 中的数字信息进行稳健的化学保存。

Angew Chem Int Ed Engl. 2015 Feb 16;54(8):2552-5. doi: 10.1002/anie.201411378. Epub 2015 Feb 4.

Towards practical, high-capacity, low-maintenance information storage in synthesized DNA.在合成 DNA 中实现实用、大容量、低维护的信息存储。

Nature. 2013 Feb 7;494(7435):77-80. doi: 10.1038/nature11875. Epub 2013 Jan 23.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

分子级相似性搜索将计算技术应用于 DNA 数据存储。

Molecular-level similarity search brings computing to DNA data storage.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献