• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于 DNA 存储的 HEDGES 纠错码可纠正插入缺失,并允许序列约束。

HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints.

机构信息

Department of Computer Science, The University of Texas at Austin, Austin, TX 78712;

Department of Integrative Biology, The University of Texas at Austin, Austin, TX 78712.

出版信息

Proc Natl Acad Sci U S A. 2020 Aug 4;117(31):18489-18496. doi: 10.1073/pnas.2004821117. Epub 2020 Jul 16.

DOI:10.1073/pnas.2004821117
PMID:32675237
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7414044/
Abstract

Synthetic DNA is rapidly emerging as a durable, high-density information storage platform. A major challenge for DNA-based information encoding strategies is the high rate of errors that arise during DNA synthesis and sequencing. Here, we describe the HEDGES (Hash Encoded, Decoded by Greedy Exhaustive Search) error-correcting code that repairs all three basic types of DNA errors: insertions, deletions, and substitutions. HEDGES also converts unresolved or compound errors into substitutions, restoring synchronization for correction via a standard Reed-Solomon outer code that is interleaved across strands. Moreover, HEDGES can incorporate a broad class of user-defined sequence constraints, such as avoiding excess repeats, or too high or too low windowed guanine-cytosine (GC) content. We test our code both via in silico simulations and with synthesized DNA. From its measured performance, we develop a statistical model applicable to much larger datasets. Predicted performance indicates the possibility of error-free recovery of petabyte- and exabyte-scale data from DNA degraded with as much as 10% errors. As the cost of DNA synthesis and sequencing continues to drop, we anticipate that HEDGES will find applications in large-scale error-free information encoding.

摘要

合成 DNA 迅速成为一种耐用、高密度的信息存储平台。基于 DNA 的信息编码策略面临的一个主要挑战是,在 DNA 合成和测序过程中会产生很高的错误率。在这里,我们描述了 HEDGES(哈希编码,通过贪婪穷尽搜索解码)纠错码,它可以修复所有三种基本类型的 DNA 错误:插入、缺失和替换。HEDGES 还将未解决或复合错误转换为替换,通过交错在链上的标准 Reed-Solomon 外码恢复纠错同步。此外,HEDGES 可以包含广泛的用户定义序列约束,例如避免过度重复,或过高或过低的窗口鸟嘌呤-胞嘧啶(GC)含量。我们通过计算机模拟和合成 DNA 对我们的代码进行了测试。根据其测量性能,我们开发了一个适用于更大数据集的统计模型。预测性能表明,有可能从 DNA 中恢复无错误的数据,这些 DNA 经过降解后,错误率高达 10%。随着 DNA 合成和测序成本的持续下降,我们预计 HEDGES 将在大规模无错误信息编码中得到应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cf/7414044/eb1d4388cd91/pnas.2004821117fig02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cf/7414044/f326a3bf55b0/pnas.2004821117fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cf/7414044/eb1d4388cd91/pnas.2004821117fig02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cf/7414044/f326a3bf55b0/pnas.2004821117fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cf/7414044/eb1d4388cd91/pnas.2004821117fig02.jpg

相似文献

1
HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints.用于 DNA 存储的 HEDGES 纠错码可纠正插入缺失,并允许序列约束。
Proc Natl Acad Sci U S A. 2020 Aug 4;117(31):18489-18496. doi: 10.1073/pnas.2004821117. Epub 2020 Jul 16.
2
Indel-correcting DNA barcodes for high-throughput sequencing.高通量测序的无错切 DNA 条形码。
Proc Natl Acad Sci U S A. 2018 Jul 3;115(27):E6217-E6226. doi: 10.1073/pnas.1802640115. Epub 2018 Jun 20.
3
Error-correcting codes and information in biology.纠错码与生物学中的信息
Biosystems. 2019 Oct;184:103987. doi: 10.1016/j.biosystems.2019.103987. Epub 2019 Jul 8.
4
Efficient DNA-based data storage using shortmer combinatorial encoding.利用短序列组合编码实现高效的基于 DNA 的数据存储。
Sci Rep. 2024 Apr 2;14(1):7731. doi: 10.1038/s41598-024-58386-z.
5
A highly parallel strategy for storage of digital information in living cells.一种在活细胞中存储数字信息的高度并行策略。
BMC Biotechnol. 2018 Oct 17;18(1):64. doi: 10.1186/s12896-018-0476-4.
6
DNA-Aeon provides flexible arithmetic coding for constraint adherence and error correction in DNA storage.DNA-Aeon 为 DNA 存储提供了灵活的算术编码,以确保约束遵守和错误纠正。
Nat Commun. 2023 Feb 6;14(1):628. doi: 10.1038/s41467-023-36297-3.
7
Is there an error correcting code in the base sequence in DNA?DNA的碱基序列中存在纠错码吗?
Biophys J. 1996 Sep;71(3):1539-44. doi: 10.1016/S0006-3495(96)79356-6.
8
Multiple errors correction for position-limited DNA sequences with GC balance and no homopolymer for DNA-based data storage.用于基于DNA的数据存储的具有GC平衡且无同聚物的位置受限DNA序列的多重错误校正。
Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac484.
9
Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage.多序列比对算法(MAFFT)在 DNA 存储中纠错能力的研究。
BMC Bioinformatics. 2023 Mar 23;24(1):111. doi: 10.1186/s12859-023-05237-9.
10
Overcoming the High Error Rate of Composite DNA Letters-Based Digital Storage through Soft-Decision Decoding.通过软判决解码克服基于复合 DNA 字母的数字存储的高错误率。
Adv Sci (Weinh). 2024 Aug;11(30):e2402951. doi: 10.1002/advs.202402951. Epub 2024 Jun 14.

引用本文的文献

1
Predict the degree of secondary structures of the encoding sequences in DNA storage by deep learning model.通过深度学习模型预测DNA存储中编码序列的二级结构程度。
Sci Rep. 2025 Jul 1;15(1):20920. doi: 10.1038/s41598-025-05717-3.
2
INNSE: Invertible neural network-based DNA image storage with self-correction encoding.INNSE:基于可逆神经网络的具有自校正编码的DNA图像存储
Comput Struct Biotechnol J. 2025 Jun 6;27:2492-2502. doi: 10.1016/j.csbj.2025.06.003. eCollection 2025.
3
Sequence analysis and decoding with extra low-quality reads for DNA data storage.

本文引用的文献

1
Molecular digital data storage using DNA.利用 DNA 进行分子数字数据存储。
Nat Rev Genet. 2019 Aug;20(8):456-466. doi: 10.1038/s41576-019-0125-3.
2
Indel-correcting DNA barcodes for high-throughput sequencing.高通量测序的无错切 DNA 条形码。
Proc Natl Acad Sci U S A. 2018 Jul 3;115(27):E6217-E6226. doi: 10.1073/pnas.1802640115. Epub 2018 Jun 20.
3
Random access in large-scale DNA data storage.大规模 DNA 数据存储中的随机访问。
用于DNA数据存储的具有极低质量读数的序列分析与解码
Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf335.
4
Self-documenting plasmids.自我记录质粒。
Trends Biotechnol. 2025 Apr 7. doi: 10.1016/j.tibtech.2025.03.010.
5
DNA storage: The future direction for medical cold data storage.DNA存储:医学冷数据存储的未来方向。
Synth Syst Biotechnol. 2025 Mar 14;10(2):677-695. doi: 10.1016/j.synbio.2025.03.006. eCollection 2025 Jun.
6
Pragmatic soft-decision data readout of encoded large DNA.编码大DNA的实用软判决数据读出
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf102.
7
Robust multi-read reconstruction from noisy clusters using deep neural network for DNA storage.使用深度神经网络从噪声簇中进行稳健的多读取重建以用于DNA存储。
Comput Struct Biotechnol J. 2024 Mar 1;23:1076-1087. doi: 10.1016/j.csbj.2024.02.019. eCollection 2024 Dec.
8
Nanopore decoding with speed and versatility for data storage.用于数据存储的具有速度和通用性的纳米孔解码。
Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btaf006.
9
DNA palette code for time-series archival data storage.用于时间序列存档数据存储的DNA调色板编码。
Natl Sci Rev. 2024 Sep 10;12(1):nwae321. doi: 10.1093/nsr/nwae321. eCollection 2025 Jan.
10
DP-ID: Interleaving and Denoising to Improve the Quality of DNA Storage Image.DP-ID:交织与去噪以提高DNA存储图像质量
Interdiscip Sci. 2024 Nov 22. doi: 10.1007/s12539-024-00671-6.
Nat Biotechnol. 2018 Mar;36(3):242-248. doi: 10.1038/nbt.4079. Epub 2018 Feb 19.
4
Nanopore sequencing and assembly of a human genome with ultra-long reads.纳米孔测序和超长读长组装人类基因组。
Nat Biotechnol. 2018 Apr;36(4):338-345. doi: 10.1038/nbt.4060. Epub 2018 Jan 29.
5
Portable and Error-Free DNA-Based Data Storage.基于 DNA 的便携式无错误数据存储。
Sci Rep. 2017 Jul 10;7(1):5011. doi: 10.1038/s41598-017-05188-1.
6
DNA Fountain enables a robust and efficient storage architecture.DNA 喷泉实现了稳健且高效的存储架构。
Science. 2017 Mar 3;355(6328):950-954. doi: 10.1126/science.aaj2038.
7
Robust chemical preservation of digital information on DNA in silica with error-correcting codes.利用纠错码在硅基片上对 DNA 中的数字信息进行稳健的化学保存。
Angew Chem Int Ed Engl. 2015 Feb 16;54(8):2552-5. doi: 10.1002/anie.201411378. Epub 2015 Feb 4.
8
Effects of GC bias in next-generation-sequencing data on de novo genome assembly.下一代测序数据中的 GC 偏倚对从头基因组组装的影响。
PLoS One. 2013 Apr 29;8(4):e62856. doi: 10.1371/journal.pone.0062856. Print 2013.
9
Towards practical, high-capacity, low-maintenance information storage in synthesized DNA.在合成 DNA 中实现实用、大容量、低维护的信息存储。
Nature. 2013 Feb 7;494(7435):77-80. doi: 10.1038/nature11875. Epub 2013 Jan 23.
10
Next-generation digital information storage in DNA.DNA 中的下一代数字信息存储。
Science. 2012 Sep 28;337(6102):1628. doi: 10.1126/science.1226355. Epub 2012 Aug 16.