Suppr超能文献

分子级相似性搜索将计算技术应用于 DNA 数据存储。

Molecular-level similarity search brings computing to DNA data storage.

机构信息

Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA.

Microsoft Research, Redmond, WA, USA.

出版信息

Nat Commun. 2021 Aug 6;12(1):4764. doi: 10.1038/s41467-021-24991-z.

Abstract

As global demand for digital storage capacity grows, storage technologies based on synthetic DNA have emerged as a dense and durable alternative to traditional media. Existing approaches leverage robust error correcting codes and precise molecular mechanisms to reliably retrieve specific files from large databases. Typically, files are retrieved using a pre-specified key, analogous to a filename. However, these approaches lack the ability to perform more complex computations over the stored data, such as similarity search: e.g., finding images that look similar to an image of interest without prior knowledge of their file names. Here we demonstrate a technique for executing similarity search over a DNA-based database of 1.6 million images. Queries are implemented as hybridization probes, and a key step in our approach was to learn an image-to-sequence encoding ensuring that queries preferentially bind to targets representing visually similar images. Experimental results show that our molecular implementation performs comparably to state-of-the-art in silico algorithms for similarity search.

摘要

随着全球对数字存储容量的需求不断增长,基于合成 DNA 的存储技术作为传统媒体的一种密集且持久的替代品而出现。现有的方法利用强大的纠错码和精确的分子机制,从大型数据库中可靠地检索特定文件。通常,使用预定义的密钥(类似于文件名)来检索文件。但是,这些方法缺乏对存储数据执行更复杂计算的能力,例如相似性搜索:例如,在没有事先了解文件名的情况下找到与感兴趣的图像相似的图像。在这里,我们展示了一种在基于 DNA 的 160 万张图像数据库上执行相似性搜索的技术。查询是作为杂交探针实现的,我们的方法中的一个关键步骤是学习图像到序列的编码,以确保查询优先与代表视觉相似图像的目标结合。实验结果表明,我们的分子实现与最先进的基于计算机的相似性搜索算法相当。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cb1/8346626/0b1c1b97acda/41467_2021_24991_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验