在 DNA 中读写数字数据。

Reading and writing digital data in DNA.

机构信息

Institute for Chemical and Bioengineering, Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland.

Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA.

出版信息

Nat Protoc. 2020 Jan;15(1):86-101. doi: 10.1038/s41596-019-0244-5. Epub 2019 Nov 29.

DOI:10.1038/s41596-019-0244-5

PMID:31784718

Abstract

Because of its longevity and enormous information density, DNA is considered a promising data storage medium. In this work, we provide instructions for archiving digital information in the form of DNA and for subsequently retrieving it from the DNA. In principle, information can be represented in DNA by simply mapping the digital information to DNA and synthesizing it. However, imperfections in synthesis, sequencing, storage and handling of the DNA induce errors within the molecules, making error-free information storage challenging. The procedure discussed here enables error-free storage by protecting the information using error-correcting codes. Specifically, in this protocol, we provide the technical details and precise instructions for translating digital information to DNA sequences, physically handling the biomolecules, storing them and subsequently re-obtaining the information by sequencing the DNA. Along with the protocol, we provide computer code that automatically encodes digital information to DNA sequences and decodes the information back from DNA to a digital file. The required software is provided on a Github repository. The protocol relies on commercial DNA synthesis and DNA sequencing via Illumina dye sequencing, and requires 1-2 h of preparation time, 1/2 d for sequencing preparation and 2-4 h for data analysis. This protocol focuses on storage scales of ~100 kB to 15 MB, offering an ideal starting point for small experiments. It can be augmented to enable higher data volumes and random access to the data and also allows for future sequencing and synthesis technologies, by changing the parameters of the encoder/decoder to account for the corresponding error rates.

摘要

由于 DNA 具有长寿命和巨大的信息密度，因此被认为是一种有前途的数据存储介质。在这项工作中，我们提供了将数字信息以 DNA 的形式存档并随后从 DNA 中检索它的说明。原则上，可以通过将数字信息简单地映射到 DNA 并合成它来在 DNA 中表示信息。然而，在 DNA 的合成、测序、存储和处理过程中存在的不完美会导致分子内出现错误，使得无错误的信息存储具有挑战性。这里讨论的过程通过使用纠错码来保护信息，从而实现无错误的存储。具体来说，在这个方案中，我们提供了将数字信息转换为 DNA 序列、物理处理生物分子、存储它们以及随后通过测序 DNA 重新获取信息的技术细节和精确说明。除了方案本身，我们还提供了自动将数字信息编码为 DNA 序列并将信息从 DNA 解码回数字文件的计算机代码。所需的软件在一个 Github 存储库中提供。该方案依赖于商业 DNA 合成和通过 Illumina 染料测序进行 DNA 测序，需要 1-2 小时的准备时间、1/2 天的测序准备时间和 2-4 小时的数据分析时间。该方案专注于 100 kB 到 15 MB 的存储规模，为小实验提供了一个理想的起点。通过改变编码器/解码器的参数来适应相应的错误率，可以对其进行扩充以实现更高的数据量和对数据的随机访问，并且还允许未来的测序和合成技术。

相似文献

Reading and writing digital data in DNA.

Nat Protoc. 2020 Jan;15(1):86-101. doi: 10.1038/s41596-019-0244-5. Epub 2019 Nov 29.

Portable and Error-Free DNA-Based Data Storage.

Sci Rep. 2017 Jul 10;7(1):5011. doi: 10.1038/s41598-017-05188-1.

A Characterization of the DNA Data Storage Channel.

Sci Rep. 2019 Jul 4;9(1):9663. doi: 10.1038/s41598-019-45832-6.

A highly parallel strategy for storage of digital information in living cells.

BMC Biotechnol. 2018 Oct 17;18(1):64. doi: 10.1186/s12896-018-0476-4.

Random access in large-scale DNA data storage.

Nat Biotechnol. 2018 Mar;36(3):242-248. doi: 10.1038/nbt.4079. Epub 2018 Feb 19.

Multiple errors correction for position-limited DNA sequences with GC balance and no homopolymer for DNA-based data storage.

Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac484.

In-vitro validated methods for encoding digital data in deoxyribonucleic acid (DNA).

BMC Bioinformatics. 2023 Apr 21;24(1):160. doi: 10.1186/s12859-023-05264-6.

DNA Bloom Filter enables anti-contamination and file version control for DNA-based data storage.

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae125.

NOREC4DNA: using near-optimal rateless erasure codes for DNA storage.

BMC Bioinformatics. 2021 Aug 17;22(1):406. doi: 10.1186/s12859-021-04318-x.

DNA-DISK: Automated end-to-end data storage via enzymatic single-nucleotide DNA synthesis and sequencing on digital microfluidics.

Proc Natl Acad Sci U S A. 2024 Aug 20;121(34):e2410164121. doi: 10.1073/pnas.2410164121. Epub 2024 Aug 15.

引用本文的文献

Exploring the intersection of natural sciences and information technology via entropy and randomness.

Nat Commun. 2025 Jul 29;16(1):6969. doi: 10.1038/s41467-025-62353-1.

Photonic microspheres for high-capacity DNA data storage: Robust, straightforward, and scalable random access via nonfading indexes.

Sci Adv. 2025 Jun 20;11(25):eadw2613. doi: 10.1126/sciadv.adw2613. Epub 2025 Jun 18.

Exploring potential biosafety implications in DNA information storage.

Biosaf Health. 2025 Mar 27;7(2):132-139. doi: 10.1016/j.bsheal.2025.03.006. eCollection 2025 Apr.

A Programmed DNA Dynamic Assembly-Guided Molecular Amplifier for Authentic Information Decryption.

Adv Sci (Weinh). 2025 Jun;12(22):e2409586. doi: 10.1002/advs.202409586. Epub 2025 May 19.

Towards next-generation DNA encryption via an expanded genetic system.

Natl Sci Rev. 2024 Dec 23;12(4):nwae469. doi: 10.1093/nsr/nwae469. eCollection 2025 Apr.

The "biomolecular humanities"? New challenges and perspectives.

iScience. 2025 Jan 13;28(2):111679. doi: 10.1016/j.isci.2024.111679. eCollection 2025 Feb 21.

Robust multi-read reconstruction from noisy clusters using deep neural network for DNA storage.

Comput Struct Biotechnol J. 2024 Mar 1;23:1076-1087. doi: 10.1016/j.csbj.2024.02.019. eCollection 2024 Dec.

A generative adversarial network for multiple reads reconstruction in DNA storage.

Sci Rep. 2024 Dec 30;14(1):32071. doi: 10.1038/s41598-024-83806-5.

Parallel molecular data storage by printing epigenetic bits on DNA.

Nature. 2024 Oct;634(8035):824-832. doi: 10.1038/s41586-024-08040-5. Epub 2024 Oct 23.

Overcoming the High Error Rate of Composite DNA Letters-Based Digital Storage through Soft-Decision Decoding.

Adv Sci (Weinh). 2024 Aug;11(30):e2402951. doi: 10.1002/advs.202402951. Epub 2024 Jun 14.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

在 DNA 中读写数字数据。

Reading and writing digital data in DNA.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献