Suppr超能文献

用于 DNA 存储的 HEDGES 纠错码可纠正插入缺失,并允许序列约束。

HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints.

机构信息

Department of Computer Science, The University of Texas at Austin, Austin, TX 78712;

Department of Integrative Biology, The University of Texas at Austin, Austin, TX 78712.

出版信息

Proc Natl Acad Sci U S A. 2020 Aug 4;117(31):18489-18496. doi: 10.1073/pnas.2004821117. Epub 2020 Jul 16.

Abstract

Synthetic DNA is rapidly emerging as a durable, high-density information storage platform. A major challenge for DNA-based information encoding strategies is the high rate of errors that arise during DNA synthesis and sequencing. Here, we describe the HEDGES (Hash Encoded, Decoded by Greedy Exhaustive Search) error-correcting code that repairs all three basic types of DNA errors: insertions, deletions, and substitutions. HEDGES also converts unresolved or compound errors into substitutions, restoring synchronization for correction via a standard Reed-Solomon outer code that is interleaved across strands. Moreover, HEDGES can incorporate a broad class of user-defined sequence constraints, such as avoiding excess repeats, or too high or too low windowed guanine-cytosine (GC) content. We test our code both via in silico simulations and with synthesized DNA. From its measured performance, we develop a statistical model applicable to much larger datasets. Predicted performance indicates the possibility of error-free recovery of petabyte- and exabyte-scale data from DNA degraded with as much as 10% errors. As the cost of DNA synthesis and sequencing continues to drop, we anticipate that HEDGES will find applications in large-scale error-free information encoding.

摘要

合成 DNA 迅速成为一种耐用、高密度的信息存储平台。基于 DNA 的信息编码策略面临的一个主要挑战是,在 DNA 合成和测序过程中会产生很高的错误率。在这里,我们描述了 HEDGES(哈希编码,通过贪婪穷尽搜索解码)纠错码,它可以修复所有三种基本类型的 DNA 错误:插入、缺失和替换。HEDGES 还将未解决或复合错误转换为替换,通过交错在链上的标准 Reed-Solomon 外码恢复纠错同步。此外,HEDGES 可以包含广泛的用户定义序列约束,例如避免过度重复,或过高或过低的窗口鸟嘌呤-胞嘧啶(GC)含量。我们通过计算机模拟和合成 DNA 对我们的代码进行了测试。根据其测量性能,我们开发了一个适用于更大数据集的统计模型。预测性能表明,有可能从 DNA 中恢复无错误的数据,这些 DNA 经过降解后,错误率高达 10%。随着 DNA 合成和测序成本的持续下降,我们预计 HEDGES 将在大规模无错误信息编码中得到应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cf/7414044/f326a3bf55b0/pnas.2004821117fig01.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验