Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, 98195, USA.
Microsoft, Redmond, WA, 98052, USA.
Nat Commun. 2020 Jan 30;11(1):616. doi: 10.1038/s41467-020-14319-8.
Synthetic DNA is gaining momentum as a potential storage medium for archival data storage. In this process, digital information is translated into sequences of nucleotides and the resulting synthetic DNA strands are then stored for later retrieval. Here, we demonstrate reliable file recovery with PCR-based random access when as few as ten copies per sequence are stored, on average. This results in density of about 17 exabytes/gram, nearly two orders of magnitude greater than prior work has shown. We successfully retrieve the same data in a complex pool of over 10 unique sequences per microliter with no evidence that we have begun to approach complexity limits. Finally, we also investigate the effects of file size and sequencing coverage on successful file retrieval and look for systematic DNA strand drop out. These findings substantiate the robustness and high data density of the process examined here.
合成 DNA 作为一种潜在的档案数据存储介质越来越受到关注。在这个过程中,数字信息被转化为核苷酸序列,然后将生成的合成 DNA 链进行存储,以备后续检索。在这里,我们证明了在平均每个序列存储 10 个拷贝的情况下,基于 PCR 的随机访问可以可靠地恢复文件。这使得存储密度达到了每克 17 艾字节,比之前的工作提高了近两个数量级。我们成功地从超过 10 个独特序列的每微升复杂混合物中检索到相同的数据,没有迹象表明我们已经接近复杂度极限。最后,我们还研究了文件大小和测序覆盖率对成功文件检索的影响,并寻找系统的 DNA 链丢失。这些发现证实了这里所研究的过程的稳健性和高数据密度。