• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

具有喷泉码的DNA存储系统的协作序列聚类与解码

Cooperative sequence clustering and decoding for DNA storage system with fountain codes.

作者信息

Jeong Jaeho, Park Seong-Joon, Kim Jae-Won, No Jong-Seon, Jeon Ha Hyeon, Lee Jeong Wook, No Albert, Kim Sunghwan, Park Hosung

机构信息

Department of Electrical and Computer Engineering, Seoul National University, Institute of New Media and Communications (INMC), Seoul 08826, South Korea.

Department of Electronic Engineering, Gyeongsang National University, Engineering Research Institute, Jinju 52828, South Korea.

出版信息

Bioinformatics. 2021 Oct 11;37(19):3136-3143. doi: 10.1093/bioinformatics/btab246.

DOI:10.1093/bioinformatics/btab246
PMID:33904574
Abstract

MOTIVATION

In DNA storage systems, there are tradeoffs between writing and reading costs. Increasing the code rate of error-correcting codes may save writing cost, but it will need more sequence reads for data retrieval. There is potentially a way to improve sequencing and decoding processes in such a way that the reading cost induced by this tradeoff is reduced without increasing the writing cost. In past researches, clustering, alignment and decoding processes were considered as separate stages but we believe that using the information from all these processes together may improve decoding performance. Actual experiments of DNA synthesis and sequencing should be performed because simulations cannot be relied on to cover all error possibilities in practical circumstances.

RESULTS

For DNA storage systems using fountain code and Reed-Solomon (RS) code, we introduce several techniques to improve the decoding performance. We designed the decoding process focusing on the cooperation of key components: Hamming-distance based clustering, discarding of abnormal sequence reads, RS error correction as well as detection and quality score-based ordering of sequences. We synthesized 513.6 KB data into DNA oligo pools and sequenced this data successfully with Illumina MiSeq instrument. Compared to Erlich's research, the proposed decoding method additionally incorporates sequence reads with minor errors which had been discarded before, and thus was able to make use of 10.6-11.9% more sequence reads from the same sequencing environment, this resulted in 6.5-8.9% reduction in the reading cost. Channel characteristics including sequence coverage and read-length distributions are provided as well.

AVAILABILITY AND IMPLEMENTATION

The raw data files and the source codes of our experiments are available at: https://github.com/jhjeong0702/dna-storage.

摘要

动机

在DNA存储系统中,写入成本和读取成本之间存在权衡。提高纠错码的码率可能会节省写入成本,但数据检索时需要更多的序列读取。可能有一种方法可以改进测序和解码过程,从而在不增加写入成本的情况下降低这种权衡所带来的读取成本。在过去的研究中,聚类、比对和解码过程被视为独立的阶段,但我们认为将所有这些过程中的信息一起使用可能会提高解码性能。由于无法依靠模拟来涵盖实际情况下所有的错误可能性,因此应该进行DNA合成和测序的实际实验。

结果

对于使用喷泉码和里德 - 所罗门(RS)码的DNA存储系统,我们引入了几种技术来提高解码性能。我们设计了解码过程,重点关注关键组件的协作:基于汉明距离的聚类、丢弃异常序列读取、RS纠错以及基于检测和质量分数的序列排序。我们将513.6 KB数据合成到DNA寡核苷酸池中,并使用Illumina MiSeq仪器成功对该数据进行了测序。与埃利希的研究相比,所提出的解码方法额外纳入了之前被丢弃的带有小错误的序列读取,因此能够在相同的测序环境中多利用10.6 - 11.9%的序列读取,这使得读取成本降低了6.5 - 8.9%。还提供了包括序列覆盖度和读取长度分布在内的通道特征。

可用性和实现方式

我们实验的原始数据文件和源代码可在以下网址获取:https://github.com/jhjeong0702/dna-storage 。

相似文献

1
Cooperative sequence clustering and decoding for DNA storage system with fountain codes.具有喷泉码的DNA存储系统的协作序列聚类与解码
Bioinformatics. 2021 Oct 11;37(19):3136-3143. doi: 10.1093/bioinformatics/btab246.
2
Reducing cost in DNA-based data storage by sequence analysis-aided soft information decoding of variable-length reads.通过序列分析辅助的变长读取软信息解码来降低 DNA 数据存储成本。
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad548.
3
Iterative Soft Decoding Algorithm for DNA Storage Using Quality Score and Redecoding.基于质量分数和重编码的 DNA 存储迭代软解码算法
IEEE Trans Nanobioscience. 2024 Jan;23(1):81-90. doi: 10.1109/TNB.2023.3284406. Epub 2024 Jan 3.
4
Improving error-correcting capability in DNA digital storage via soft-decision decoding.通过软判决解码提高DNA数字存储中的纠错能力。
Natl Sci Rev. 2023 Sep 2;11(2):nwad229. doi: 10.1093/nsr/nwad229. eCollection 2024 Feb.
5
RepairNatrix: a Snakemake workflow for processing DNA sequencing data for DNA storage.RepairNatrix:用于处理DNA存储的DNA测序数据的Snakemake工作流程。
Bioinform Adv. 2023 Aug 26;3(1):vbad117. doi: 10.1093/bioadv/vbad117. eCollection 2023.
6
Overcoming the High Error Rate of Composite DNA Letters-Based Digital Storage through Soft-Decision Decoding.通过软判决解码克服基于复合 DNA 字母的数字存储的高错误率。
Adv Sci (Weinh). 2024 Aug;11(30):e2402951. doi: 10.1002/advs.202402951. Epub 2024 Jun 14.
7
Multiple errors correction for position-limited DNA sequences with GC balance and no homopolymer for DNA-based data storage.用于基于DNA的数据存储的具有GC平衡且无同聚物的位置受限DNA序列的多重错误校正。
Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac484.
8
NOREC4DNA: using near-optimal rateless erasure codes for DNA storage.NOREC4DNA:使用近最优无码率擦除码进行 DNA 存储。
BMC Bioinformatics. 2021 Aug 17;22(1):406. doi: 10.1186/s12859-021-04318-x.
9
Illumina error correction near highly repetitive DNA regions improves de novo genome assembly.Illumina 纠错技术在高度重复 DNA 区域的应用提高了从头基因组组装的质量。
BMC Bioinformatics. 2019 Jun 3;20(1):298. doi: 10.1186/s12859-019-2906-2.
10
Error-correcting codes and information in biology.纠错码与生物学中的信息
Biosystems. 2019 Oct;184:103987. doi: 10.1016/j.biosystems.2019.103987. Epub 2019 Jul 8.

引用本文的文献

1
Sequence analysis and decoding with extra low-quality reads for DNA data storage.用于DNA数据存储的具有极低质量读数的序列分析与解码
Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf335.
2
Optimizing fountain codes for DNA data storage.优化用于DNA数据存储的喷泉码。
Comput Struct Biotechnol J. 2024 Oct 26;23:3878-3896. doi: 10.1016/j.csbj.2024.10.038. eCollection 2024 Dec.
3
Efficient and low-complexity variable-to-variable length coding for DNA storage.用于DNA存储的高效且低复杂度的可变到可变长度编码
BMC Bioinformatics. 2024 Oct 1;25(1):320. doi: 10.1186/s12859-024-05943-y.
4
Data recovery methods for DNA storage based on fountain codes.基于喷泉码的DNA存储数据恢复方法。
Comput Struct Biotechnol J. 2024 Apr 24;23:1808-1823. doi: 10.1016/j.csbj.2024.04.048. eCollection 2024 Dec.
5
Reducing cost in DNA-based data storage by sequence analysis-aided soft information decoding of variable-length reads.通过序列分析辅助的变长读取软信息解码来降低 DNA 数据存储成本。
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad548.
6
Towards long double-stranded chains and robust DNA-based data storage using the random code system.迈向使用随机编码系统构建长双链链状结构及实现稳健的基于DNA的数据存储。
Front Genet. 2023 Jun 13;14:1179867. doi: 10.3389/fgene.2023.1179867. eCollection 2023.
7
An image cryptography method by highly error-prone DNA storage channel.一种通过高度易出错的DNA存储通道实现的图像加密方法。
Front Bioeng Biotechnol. 2023 Apr 19;11:1173763. doi: 10.3389/fbioe.2023.1173763. eCollection 2023.
8
GCNSA: DNA storage encoding with a graph convolutional network and self-attention.GCNSA:基于图卷积网络和自注意力机制的DNA存储编码
iScience. 2023 Feb 19;26(3):106231. doi: 10.1016/j.isci.2023.106231. eCollection 2023 Mar 17.
9
Levy Equilibrium Optimizer algorithm for the DNA storage code set.用于 DNA 存储码集的 Levy 平衡优化器算法。
PLoS One. 2022 Nov 17;17(11):e0277139. doi: 10.1371/journal.pone.0277139. eCollection 2022.
10
Adaptive coding for DNA storage with high storage density and low coverage.具有高存储密度和低覆盖率的DNA存储自适应编码。
NPJ Syst Biol Appl. 2022 Jul 4;8(1):23. doi: 10.1038/s41540-022-00233-w.