Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China; Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Beijing, 100049, China.
Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China; College of Mathematics and Information Science, Hebei University, Baoding, 071002, China.
Comput Biol Med. 2023 Oct;165:107404. doi: 10.1016/j.compbiomed.2023.107404. Epub 2023 Aug 28.
DNA data storage is a promising technology that utilizes computer simulation, and synthetic biology, offering high-density and reliable digital information storage. It is challenging to store massive data in a small amount of DNA without losing the original data since nonspecific hybridization errors occur frequently and severely affect the reliability of stored data. This study proposes a novel biologically optimized encoding model for DNA data storage (BO-DNA) to overcome the reliability problem. BO-DNA model is developed by a new rule-based mapping method to avoid data drop during the transcoding of binary data to premier nucleotides. A customized optimization algorithm based on a tent chaotic map is applied to maximize the lower bounds that help to minimize the nonspecific hybridization errors. The robustness of BO-DNA is computed by four bio-constraints to confirm the reliability of newly generated DNA sequences. Experimentally, different medical images are encoded and decoded successfully with 12%-59% improved lower bounds and optimally constrained-based DNA sequences reported with 1.77bit/nt average density. BO-DNA's results demonstrate substantial advantages in constructing reliable DNA data storage.
DNA 数据存储是一种有前途的技术,利用计算机模拟和合成生物学,提供高密度和可靠的数字信息存储。由于非特异性杂交错误经常发生且严重影响存储数据的可靠性,因此在不丢失原始数据的情况下,将大量数据存储在少量 DNA 中具有挑战性。本研究提出了一种新的用于 DNA 数据存储的生物优化编码模型(BO-DNA)来克服可靠性问题。BO-DNA 模型是通过一种新的基于规则的映射方法开发的,以避免在将二进制数据转换为主要核苷酸时数据丢失。应用基于帐篷混沌图的定制优化算法来最大化下限,有助于最小化非特异性杂交错误。通过四个生物约束来计算 BO-DNA 的稳健性,以确认新生成的 DNA 序列的可靠性。实验中,成功地对不同的医学图像进行了编码和解码,具有 12%-59%提高的下限和报告的最佳基于约束的 DNA 序列具有 1.77bit/nt 的平均密度。BO-DNA 的结果表明,在构建可靠的 DNA 数据存储方面具有显著优势。