Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, 116622, Dalian, China.
School of Computer Science and Technology, Dalian University of Technology, 116024, Dalian, China.
Comput Biol Med. 2023 Sep;164:107244. doi: 10.1016/j.compbiomed.2023.107244. Epub 2023 Jul 11.
The exponential growth of global data leads to the problem of insufficient data storage capacity. DNA storage can be an ideal storage method due to its high storage density and long storage time. However, the DNA storage process is subject to unavoidable errors that can lead to increased cluster redundancy during data reading, which in turn affects the accuracy of the data reads. This paper proposes a dynamically updated hash index (DUHI) clustering method for DNA storage, which clusters sequences by constructing a dynamic core index set and using hash lookup. The proposed clustering method is analyzed in terms of overall reliability evaluation and visualization evaluation. The results show that the DUHI clustering method can reduce the redundancy of more than 10% of the sequences within the cluster and increase the reconstruction rate of the sequences to more than 99%. Therefore, our method solves the high redundancy problem after DNA sequence clustering, improves the accuracy of data reading, and promotes the development of DNA storage.
全球数据呈指数级增长,导致数据存储容量不足的问题。由于存储密度高、存储时间长,DNA 存储可以成为一种理想的存储方法。然而,DNA 存储过程中会不可避免地出现错误,这可能导致数据读取过程中簇的冗余增加,从而影响数据读取的准确性。本文提出了一种用于 DNA 存储的动态更新哈希索引 (DUHI) 聚类方法,该方法通过构建动态核心索引集并使用哈希查找对序列进行聚类。对所提出的聚类方法进行了总体可靠性评估和可视化评估。结果表明,DUHI 聚类方法可以减少簇内超过 10%的序列的冗余,并将序列的重建率提高到 99%以上。因此,我们的方法解决了 DNA 序列聚类后的高冗余问题,提高了数据读取的准确性,促进了 DNA 存储的发展。