用于 DNA 存储的 HEDGES 纠错码可纠正插入缺失，并允许序列约束。

HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints.

机构信息

Department of Computer Science, The University of Texas at Austin, Austin, TX 78712;

Department of Integrative Biology, The University of Texas at Austin, Austin, TX 78712.

出版信息

Proc Natl Acad Sci U S A. 2020 Aug 4;117(31):18489-18496. doi: 10.1073/pnas.2004821117. Epub 2020 Jul 16.

DOI:10.1073/pnas.2004821117

PMID:32675237

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7414044/

Abstract

Synthetic DNA is rapidly emerging as a durable, high-density information storage platform. A major challenge for DNA-based information encoding strategies is the high rate of errors that arise during DNA synthesis and sequencing. Here, we describe the HEDGES (Hash Encoded, Decoded by Greedy Exhaustive Search) error-correcting code that repairs all three basic types of DNA errors: insertions, deletions, and substitutions. HEDGES also converts unresolved or compound errors into substitutions, restoring synchronization for correction via a standard Reed-Solomon outer code that is interleaved across strands. Moreover, HEDGES can incorporate a broad class of user-defined sequence constraints, such as avoiding excess repeats, or too high or too low windowed guanine-cytosine (GC) content. We test our code both via in silico simulations and with synthesized DNA. From its measured performance, we develop a statistical model applicable to much larger datasets. Predicted performance indicates the possibility of error-free recovery of petabyte- and exabyte-scale data from DNA degraded with as much as 10% errors. As the cost of DNA synthesis and sequencing continues to drop, we anticipate that HEDGES will find applications in large-scale error-free information encoding.

摘要

合成 DNA 迅速成为一种耐用、高密度的信息存储平台。基于 DNA 的信息编码策略面临的一个主要挑战是，在 DNA 合成和测序过程中会产生很高的错误率。在这里，我们描述了 HEDGES（哈希编码，通过贪婪穷尽搜索解码）纠错码，它可以修复所有三种基本类型的 DNA 错误：插入、缺失和替换。HEDGES 还将未解决或复合错误转换为替换，通过交错在链上的标准 Reed-Solomon 外码恢复纠错同步。此外，HEDGES 可以包含广泛的用户定义序列约束，例如避免过度重复，或过高或过低的窗口鸟嘌呤-胞嘧啶（GC）含量。我们通过计算机模拟和合成 DNA 对我们的代码进行了测试。根据其测量性能，我们开发了一个适用于更大数据集的统计模型。预测性能表明，有可能从 DNA 中恢复无错误的数据，这些 DNA 经过降解后，错误率高达 10%。随着 DNA 合成和测序成本的持续下降，我们预计 HEDGES 将在大规模无错误信息编码中得到应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cf/7414044/f326a3bf55b0/pnas.2004821117fig01.jpg

相似文献

HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints.用于 DNA 存储的 HEDGES 纠错码可纠正插入缺失，并允许序列约束。

Proc Natl Acad Sci U S A. 2020 Aug 4;117(31):18489-18496. doi: 10.1073/pnas.2004821117. Epub 2020 Jul 16.

Indel-correcting DNA barcodes for high-throughput sequencing.高通量测序的无错切 DNA 条形码。

Proc Natl Acad Sci U S A. 2018 Jul 3;115(27):E6217-E6226. doi: 10.1073/pnas.1802640115. Epub 2018 Jun 20.

Error-correcting codes and information in biology.纠错码与生物学中的信息

Biosystems. 2019 Oct;184:103987. doi: 10.1016/j.biosystems.2019.103987. Epub 2019 Jul 8.

Efficient DNA-based data storage using shortmer combinatorial encoding.利用短序列组合编码实现高效的基于 DNA 的数据存储。

Sci Rep. 2024 Apr 2;14(1):7731. doi: 10.1038/s41598-024-58386-z.

A highly parallel strategy for storage of digital information in living cells.一种在活细胞中存储数字信息的高度并行策略。

BMC Biotechnol. 2018 Oct 17;18(1):64. doi: 10.1186/s12896-018-0476-4.

DNA-Aeon provides flexible arithmetic coding for constraint adherence and error correction in DNA storage.DNA-Aeon 为 DNA 存储提供了灵活的算术编码，以确保约束遵守和错误纠正。

Nat Commun. 2023 Feb 6;14(1):628. doi: 10.1038/s41467-023-36297-3.

Is there an error correcting code in the base sequence in DNA?DNA的碱基序列中存在纠错码吗？

Biophys J. 1996 Sep;71(3):1539-44. doi: 10.1016/S0006-3495(96)79356-6.

Multiple errors correction for position-limited DNA sequences with GC balance and no homopolymer for DNA-based data storage.用于基于DNA的数据存储的具有GC平衡且无同聚物的位置受限DNA序列的多重错误校正。

Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac484.

Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage.多序列比对算法（MAFFT）在 DNA 存储中纠错能力的研究。

BMC Bioinformatics. 2023 Mar 23;24(1):111. doi: 10.1186/s12859-023-05237-9.

Overcoming the High Error Rate of Composite DNA Letters-Based Digital Storage through Soft-Decision Decoding.通过软判决解码克服基于复合 DNA 字母的数字存储的高错误率。

Adv Sci (Weinh). 2024 Aug;11(30):e2402951. doi: 10.1002/advs.202402951. Epub 2024 Jun 14.

引用本文的文献

Predict the degree of secondary structures of the encoding sequences in DNA storage by deep learning model.通过深度学习模型预测DNA存储中编码序列的二级结构程度。

Sci Rep. 2025 Jul 1;15(1):20920. doi: 10.1038/s41598-025-05717-3.

INNSE: Invertible neural network-based DNA image storage with self-correction encoding.INNSE：基于可逆神经网络的具有自校正编码的DNA图像存储

Comput Struct Biotechnol J. 2025 Jun 6;27:2492-2502. doi: 10.1016/j.csbj.2025.06.003. eCollection 2025.

Sequence analysis and decoding with extra low-quality reads for DNA data storage.用于DNA数据存储的具有极低质量读数的序列分析与解码

Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf335.

Self-documenting plasmids.自我记录质粒。

Trends Biotechnol. 2025 Apr 7. doi: 10.1016/j.tibtech.2025.03.010.

DNA storage: The future direction for medical cold data storage.DNA存储：医学冷数据存储的未来方向。

Synth Syst Biotechnol. 2025 Mar 14;10(2):677-695. doi: 10.1016/j.synbio.2025.03.006. eCollection 2025 Jun.

Pragmatic soft-decision data readout of encoded large DNA.编码大DNA的实用软判决数据读出

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf102.

Robust multi-read reconstruction from noisy clusters using deep neural network for DNA storage.使用深度神经网络从噪声簇中进行稳健的多读取重建以用于DNA存储。

Comput Struct Biotechnol J. 2024 Mar 1;23:1076-1087. doi: 10.1016/j.csbj.2024.02.019. eCollection 2024 Dec.

Nanopore decoding with speed and versatility for data storage.用于数据存储的具有速度和通用性的纳米孔解码。

Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btaf006.

DNA palette code for time-series archival data storage.用于时间序列存档数据存储的DNA调色板编码。

Natl Sci Rev. 2024 Sep 10;12(1):nwae321. doi: 10.1093/nsr/nwae321. eCollection 2025 Jan.

DP-ID: Interleaving and Denoising to Improve the Quality of DNA Storage Image.DP-ID：交织与去噪以提高DNA存储图像质量

Interdiscip Sci. 2024 Nov 22. doi: 10.1007/s12539-024-00671-6.

本文引用的文献

Molecular digital data storage using DNA.利用 DNA 进行分子数字数据存储。

Nat Rev Genet. 2019 Aug;20(8):456-466. doi: 10.1038/s41576-019-0125-3.

Indel-correcting DNA barcodes for high-throughput sequencing.高通量测序的无错切 DNA 条形码。

Proc Natl Acad Sci U S A. 2018 Jul 3;115(27):E6217-E6226. doi: 10.1073/pnas.1802640115. Epub 2018 Jun 20.

Random access in large-scale DNA data storage.大规模 DNA 数据存储中的随机访问。

Nat Biotechnol. 2018 Mar;36(3):242-248. doi: 10.1038/nbt.4079. Epub 2018 Feb 19.

Nanopore sequencing and assembly of a human genome with ultra-long reads.纳米孔测序和超长读长组装人类基因组。

Nat Biotechnol. 2018 Apr;36(4):338-345. doi: 10.1038/nbt.4060. Epub 2018 Jan 29.

Portable and Error-Free DNA-Based Data Storage.基于 DNA 的便携式无错误数据存储。

Sci Rep. 2017 Jul 10;7(1):5011. doi: 10.1038/s41598-017-05188-1.

DNA Fountain enables a robust and efficient storage architecture.DNA 喷泉实现了稳健且高效的存储架构。

Science. 2017 Mar 3;355(6328):950-954. doi: 10.1126/science.aaj2038.

Robust chemical preservation of digital information on DNA in silica with error-correcting codes.利用纠错码在硅基片上对 DNA 中的数字信息进行稳健的化学保存。

Angew Chem Int Ed Engl. 2015 Feb 16;54(8):2552-5. doi: 10.1002/anie.201411378. Epub 2015 Feb 4.

Effects of GC bias in next-generation-sequencing data on de novo genome assembly.下一代测序数据中的 GC 偏倚对从头基因组组装的影响。

PLoS One. 2013 Apr 29;8(4):e62856. doi: 10.1371/journal.pone.0062856. Print 2013.

Towards practical, high-capacity, low-maintenance information storage in synthesized DNA.在合成 DNA 中实现实用、大容量、低维护的信息存储。

Nature. 2013 Feb 7;494(7435):77-80. doi: 10.1038/nature11875. Epub 2013 Jan 23.

Next-generation digital information storage in DNA.DNA 中的下一代数字信息存储。

Science. 2012 Sep 28;337(6102):1628. doi: 10.1126/science.1226355. Epub 2012 Aug 16.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于 DNA 存储的 HEDGES 纠错码可纠正插入缺失，并允许序列约束。

HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献