Ren Yubin, Zhang Yi, Liu Yawei, Wu Qinglin, Su Juanjuan, Wang Fan, Chen Dong, Fan Chunhai, Liu Kai, Zhang Hongjie
Department of Chemistry, Tsinghua University, Beijing, 100084, China.
State Key Laboratory of Rare Earth Resource Utilization, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun, Jilin, 130022, China.
Small Methods. 2022 Apr;6(4):e2101335. doi: 10.1002/smtd.202101335. Epub 2022 Feb 10.
Information storage based on DNA molecules provides a promising solution with advantages of low-energy consumption, high storage efficiency, and long lifespan. However, there are only four natural nucleotides and DNA storage is thus limited by 2 bits per nucleotide. Here, artificial nucleotides into DNA data storage to achieve higher coding efficiency than 2 bits per nucleotide is introduced. To accommodate the characteristics of DNA synthesis and sequencing, two high-reliability encoding systems suitable for four, six, and eight nucleotides, i.e., the RaptorQ-Arithmetic-LZW-RS (RALR) and RaptorQ-Arithmetic-Base64-RS (RABR) systems, are developed. The two concatenated encoding systems realize the advantages of correcting DNA sequence losses, correcting errors within DNA sequences, reducing homopolymers, and controlling specific nucleotide contents. The average coding efficiencies with error correction and without arithmetic compression by the RALR system using four, six, and eight nucleotides reach 1.27, 1.61, and 1.85 bits per nucleotide, respectively. While the average coding efficiencies by the RABR system are up to 1.50, 2.00, and 2.35 bits per nucleotide, respectively. The coding efficiency, versatility, and tunability of the developed artificial DNA systems might provide significant guidance for high-reliability and high-density data storage.
基于DNA分子的信息存储提供了一种很有前景的解决方案,具有低能耗、高存储效率和长寿命等优点。然而,天然核苷酸只有四种,因此DNA存储受到每个核苷酸2比特的限制。在此,引入人工核苷酸用于DNA数据存储,以实现高于每个核苷酸2比特的编码效率。为适应DNA合成和测序的特点,开发了两种适用于四种、六种和八种核苷酸的高可靠性编码系统,即猛禽Q算法-LZW-RS(RALR)和猛禽Q算法-64基-RS(RABR)系统。这两种级联编码系统实现了纠正DNA序列丢失、纠正DNA序列内错误、减少同聚物以及控制特定核苷酸含量等优点。使用四种、六种和八种核苷酸的RALR系统在有纠错且无算术压缩情况下的平均编码效率分别达到每个核苷酸1.27、1.61和1.85比特。而RABR系统的平均编码效率分别高达每个核苷酸1.50、2.00和2.35比特。所开发的人工DNA系统的编码效率、通用性和可调性可能为高可靠性和高密度数据存储提供重要指导。