Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA.
Microsoft Research, Redmond, WA, USA.
Nat Commun. 2021 Aug 6;12(1):4764. doi: 10.1038/s41467-021-24991-z.
As global demand for digital storage capacity grows, storage technologies based on synthetic DNA have emerged as a dense and durable alternative to traditional media. Existing approaches leverage robust error correcting codes and precise molecular mechanisms to reliably retrieve specific files from large databases. Typically, files are retrieved using a pre-specified key, analogous to a filename. However, these approaches lack the ability to perform more complex computations over the stored data, such as similarity search: e.g., finding images that look similar to an image of interest without prior knowledge of their file names. Here we demonstrate a technique for executing similarity search over a DNA-based database of 1.6 million images. Queries are implemented as hybridization probes, and a key step in our approach was to learn an image-to-sequence encoding ensuring that queries preferentially bind to targets representing visually similar images. Experimental results show that our molecular implementation performs comparably to state-of-the-art in silico algorithms for similarity search.
随着全球对数字存储容量的需求不断增长,基于合成 DNA 的存储技术作为传统媒体的一种密集且持久的替代品而出现。现有的方法利用强大的纠错码和精确的分子机制,从大型数据库中可靠地检索特定文件。通常,使用预定义的密钥(类似于文件名)来检索文件。但是,这些方法缺乏对存储数据执行更复杂计算的能力,例如相似性搜索:例如,在没有事先了解文件名的情况下找到与感兴趣的图像相似的图像。在这里,我们展示了一种在基于 DNA 的 160 万张图像数据库上执行相似性搜索的技术。查询是作为杂交探针实现的,我们的方法中的一个关键步骤是学习图像到序列的编码,以确保查询优先与代表视觉相似图像的目标结合。实验结果表明,我们的分子实现与最先进的基于计算机的相似性搜索算法相当。