Liu Deruilin, Xu Demin, Shi Liuxin, Zhang Jiayuan, Bi Kewei, Luo Bei, Liu Chen, Li Yuxiang, Fan Guangyi, Wang Wen, Ping Zhi
College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China.
BGI Research, Shenzhen, 518083, China.
GigaByte. 2025 Jan 24;2025:gigabyte147. doi: 10.46471/gigabyte.147. eCollection 2025.
The DNA molecule is a promising next-generation data storage medium. Recently, it has been theoretically proposed that non-natural or modified bases can serve as extra molecular letters to increase the information density. However, this strategy is challenging due to the difficulty in synthesizing non-natural DNA sequences and their complex structure. Here, we described a practical DNA data storage transcoding scheme named R+ based on an expanded molecular alphabet that introduces 5-methylcytosine (5mC). We demonstrated its experimental validation by encoding one representative file into several 1.3∼1.6 kbps DNA fragments for nanopore sequencing. Our results show an average data recovery rate of 98.97% and 86.91% with and without reference, respectively. Our work validates the practicability of 5mC in DNA storage systems, with a potentially wide range of applications.
R+ is implemented in Python and the code is available under a MIT license at https://github.com/Incpink-Liu/DNA-storage-R_plus.
DNA分子是一种很有前景的下一代数据存储介质。最近,从理论上提出非天然或修饰碱基可作为额外的分子字母来提高信息密度。然而,由于合成非天然DNA序列困难且其结构复杂,该策略具有挑战性。在此,我们描述了一种基于引入5-甲基胞嘧啶(5mC)的扩展分子字母表的实用DNA数据存储转码方案,名为R+。我们通过将一个代表性文件编码为几个用于纳米孔测序的1.3∼1.6 kbps DNA片段,展示了其实验验证。我们的结果显示,有参考和无参考时的数据平均恢复率分别为98.97%和86.91%。我们的工作验证了5mC在DNA存储系统中的实用性,具有潜在的广泛应用。
R+用Python实现,代码在https://github.com/Incpink-Liu/DNA-storage-R_plus上以MIT许可发布。