用于DNA数据存储的具有极低质量读数的序列分析与解码

Sequence analysis and decoding with extra low-quality reads for DNA data storage.

作者信息

Park Jiyeon, Jeon Ha Hyeon, Lee Jeong Wook, Park Hosung

机构信息

Department of Intelligent Electronics and Computer Engineering, Chonnam National University, Gwangju 61186, South Korea.

Department of Chemical Engineering, POSTECH, Pohang 37673, South Korea.

出版信息

Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf335.

DOI:10.1093/bioinformatics/btaf335

PMID:40493760

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12187058/

Abstract

MOTIVATION

Error detection/correction codes play an important role to reduce writing and/or reading costs in DNA data storage. Sequence analysis algorithms also make a crucial effect on error correction but have been executed independently from the decoding of error correction codes. In conventional sequence analysis, low-quality reads are usually discarded. For DNA data storage, low-quality reads can be constructively used to sequence analysis with the assistance of error detection/correction codes.

RESULTS

We obtained the low-quality reads which failed to pass the chastity filter in Illumina NGS sequencing. We confirmed the effectiveness of the extra low-quality reads by providing error statistics and performing decoding with them. We proposed a sequence clustering algorithm for various-length reads and a consensus algorithm based on probabilistic majority and error detection to efficiently exploit the extra reads. The proposed methods reduced the reading cost by 6.83% on average and up to 19.67% while maintaining the writing cost.

AVAILABILITY AND IMPLEMENTATION

https://github.com/PParkJy/SAD-DNAstorage (10.5281/zenodo.15571858).

摘要

动机

错误检测/纠正码在降低DNA数据存储中的写入和/或读取成本方面发挥着重要作用。序列分析算法对错误纠正也有至关重要的影响，但一直独立于错误纠正码的解码执行。在传统的序列分析中，低质量读数通常会被丢弃。对于DNA数据存储，在错误检测/纠正码的辅助下，低质量读数可被有效地用于序列分析。

结果

我们获取了在Illumina NGS测序中未通过纯度筛选的低质量读数。我们通过提供错误统计信息并使用它们进行解码，证实了这些额外低质量读数的有效性。我们提出了一种针对各种长度读数的序列聚类算法，以及一种基于概率多数和错误检测的一致性算法，以有效地利用这些额外读数。所提出的方法在保持写入成本的同时，平均将读取成本降低了6.83%，最高可达19.67%。

可用性与实现

https://github.com/PParkJy/SAD-DNAstorage (10.5281/zenodo.15571858) 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9adb/12187058/28ca9f293bce/btaf335f1.jpg

相似文献

Sequence analysis and decoding with extra low-quality reads for DNA data storage.

Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf335.

Bi-level error correction for PacBio long reads.

IEEE/ACM Trans Comput Biol Bioinform. 2020 May-June;17(3):899-905. doi: 10.1109/TCBB.2017.2780832. Epub 2017 Dec 7.

Diagnostic test accuracy and cost-effectiveness of tests for codeletion of chromosomal arms 1p and 19q in people with glioma.

Cochrane Database Syst Rev. 2022 Mar 2;3(3):CD013387. doi: 10.1002/14651858.CD013387.pub2.

A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.

Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

Optical correction of refractive error for preventing and treating eye symptoms in computer users.

Cochrane Database Syst Rev. 2018 Apr 10;4(4):CD009877. doi: 10.1002/14651858.CD009877.pub2.

Comparison of cellulose, modified cellulose and synthetic membranes in the haemodialysis of patients with end-stage renal disease.

Cochrane Database Syst Rev. 2001(3):CD003234. doi: 10.1002/14651858.CD003234.

Algorithm-based pain management for people with dementia in nursing homes.

Cochrane Database Syst Rev. 2022 Apr 1;4(4):CD013339. doi: 10.1002/14651858.CD013339.pub2.

Behavioral interventions to reduce risk for sexual transmission of HIV among men who have sex with men.

Cochrane Database Syst Rev. 2008 Jul 16(3):CD001230. doi: 10.1002/14651858.CD001230.pub2.

Assessing the comparative effects of interventions in COPD: a tutorial on network meta-analysis for clinicians.

Respir Res. 2024 Dec 21;25(1):438. doi: 10.1186/s12931-024-03056-x.

本文引用的文献

Data recovery methods for DNA storage based on fountain codes.

Comput Struct Biotechnol J. 2024 Apr 24;23:1808-1823. doi: 10.1016/j.csbj.2024.04.048. eCollection 2024 Dec.

Towards practical and robust DNA-based data archiving using the yin-yang codec system.

Nat Comput Sci. 2022 Apr;2(4):234-242. doi: 10.1038/s43588-022-00231-2. Epub 2022 Apr 25.

Reducing cost in DNA-based data storage by sequence analysis-aided soft information decoding of variable-length reads.

Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad548.

DNA-Aeon provides flexible arithmetic coding for constraint adherence and error correction in DNA storage.

Nat Commun. 2023 Feb 6;14(1):628. doi: 10.1038/s41467-023-36297-3.

Muscle5: High-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny.

Nat Commun. 2022 Nov 15;13(1):6968. doi: 10.1038/s41467-022-34630-w.

Cooperative sequence clustering and decoding for DNA storage system with fountain codes.

Bioinformatics. 2021 Oct 11;37(19):3136-3143. doi: 10.1093/bioinformatics/btab246.

DNA stability: a central design consideration for DNA data storage systems.

Nat Commun. 2021 Mar 1;12(1):1358. doi: 10.1038/s41467-021-21587-5.

HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints.

Proc Natl Acad Sci U S A. 2020 Aug 4;117(31):18489-18496. doi: 10.1073/pnas.2004821117. Epub 2020 Jul 16.

Data storage in DNA with fewer synthesis cycles using composite DNA letters.

Nat Biotechnol. 2019 Oct;37(10):1229-1236. doi: 10.1038/s41587-019-0240-x. Epub 2019 Sep 9.

A Characterization of the DNA Data Storage Channel.

Sci Rep. 2019 Jul 4;9(1):9663. doi: 10.1038/s41598-019-45832-6.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于DNA数据存储的具有极低质量读数的序列分析与解码

Sequence analysis and decoding with extra low-quality reads for DNA data storage.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性与实现

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献