Suppr超能文献

用于 DNA 数据存储的与 ASCII 码对应的编码表和一种新的纠错方法 HMSA。

An Encoding Table Corresponding to ASCII Codes for DNA Data Storage and a New Error Correction Method HMSA.

出版信息

IEEE Trans Nanobioscience. 2024 Apr;23(2):344-354. doi: 10.1109/TNB.2024.3356522. Epub 2024 Mar 28.

Abstract

DNA storage stands out from other storage media due to its high capacity, eco-friendliness, long lifespan, high stability, low energy consumption, and low data maintenance costs. To standardize the DNA encoding system, maintain consistency in character representation and transmission, and link binary, base, and character together, this paper combines the encoding method with ASCII code to construct an ASCII-DNA encoding table. The encoding method can encode not only pure text information but also audio and video information and satisfies the GC content constraint and the homopolymer constraint, with the encoding density reaching 1.4 bits/nt. In particular, when encoding textual information, it directly skips the binary conversion process, which reduces the complexity of encoding, and increasing the encoding density to 1.6 bits/nt. In order to solve the problem of errors in sequences, under the influence of heuristic algorithms, this paper proposes a new error correction method (HMSA) by combining minimum Hamming distance, multiple sequence alignment, and encoding scheme. It can correct not only substitution, insertion, and deletion errors in Reads but also consecutive errors in Reads. It greatly improves the utilization of the Reads and avoids the waste of resources. Simulation results show that the recovery rate of Reads increases with the increasing number of sequencing times. When the number of erroneous bases in a 150nt sequence reaches 5nt, the error correction rate can exceed 96% by sequencing the base sequence only 10 times regardless of whether the errors are consecutive or not. Additionally, the HMSA error correction method is applicable to all coding schemes for lookup code table types.

摘要

DNA 存储因其高容量、环保、长寿命、高稳定性、低能耗和低数据维护成本而脱颖而出。为了规范 DNA 编码系统,保持字符表示和传输的一致性,并将二进制、碱基和字符联系起来,本文结合编码方法和 ASCII 码构建了 ASCII-DNA 编码表。该编码方法不仅可以对纯文本信息进行编码,还可以对音频和视频信息进行编码,并且满足 GC 含量约束和同聚物约束,编码密度达到 1.4 位/nt。特别是在对文本信息进行编码时,它直接跳过了二进制转换过程,降低了编码的复杂性,并将编码密度提高到 1.6 位/nt。为了解决序列错误的问题,本文在启发式算法的影响下,提出了一种新的纠错方法(HMSA),通过最小汉明距离、多序列比对和编码方案相结合。它不仅可以纠正 Read 中的替换、插入和删除错误,还可以纠正 Read 中的连续错误。它大大提高了 Read 的利用率,避免了资源的浪费。仿真结果表明,随着测序次数的增加,Read 的恢复率也随之提高。当 150nt 序列中错误碱基数达到 5nt 时,无论错误是否连续,仅通过测序碱基序列 10 次,错误校正率就可以超过 96%。此外,HMSA 纠错方法适用于所有查找码表类型的编码方案。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验