Suppr超能文献

用于DNA数据存储的具有极低质量读数的序列分析与解码

Sequence analysis and decoding with extra low-quality reads for DNA data storage.

作者信息

Park Jiyeon, Jeon Ha Hyeon, Lee Jeong Wook, Park Hosung

机构信息

Department of Intelligent Electronics and Computer Engineering, Chonnam National University, Gwangju 61186, South Korea.

Department of Chemical Engineering, POSTECH, Pohang 37673, South Korea.

出版信息

Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf335.

Abstract

MOTIVATION

Error detection/correction codes play an important role to reduce writing and/or reading costs in DNA data storage. Sequence analysis algorithms also make a crucial effect on error correction but have been executed independently from the decoding of error correction codes. In conventional sequence analysis, low-quality reads are usually discarded. For DNA data storage, low-quality reads can be constructively used to sequence analysis with the assistance of error detection/correction codes.

RESULTS

We obtained the low-quality reads which failed to pass the chastity filter in Illumina NGS sequencing. We confirmed the effectiveness of the extra low-quality reads by providing error statistics and performing decoding with them. We proposed a sequence clustering algorithm for various-length reads and a consensus algorithm based on probabilistic majority and error detection to efficiently exploit the extra reads. The proposed methods reduced the reading cost by 6.83% on average and up to 19.67% while maintaining the writing cost.

AVAILABILITY AND IMPLEMENTATION

https://github.com/PParkJy/SAD-DNAstorage (10.5281/zenodo.15571858).

摘要

动机

错误检测/纠正码在降低DNA数据存储中的写入和/或读取成本方面发挥着重要作用。序列分析算法对错误纠正也有至关重要的影响,但一直独立于错误纠正码的解码执行。在传统的序列分析中,低质量读数通常会被丢弃。对于DNA数据存储,在错误检测/纠正码的辅助下,低质量读数可被有效地用于序列分析。

结果

我们获取了在Illumina NGS测序中未通过纯度筛选的低质量读数。我们通过提供错误统计信息并使用它们进行解码,证实了这些额外低质量读数的有效性。我们提出了一种针对各种长度读数的序列聚类算法,以及一种基于概率多数和错误检测的一致性算法,以有效地利用这些额外读数。所提出的方法在保持写入成本的同时,平均将读取成本降低了6.83%,最高可达19.67%。

可用性与实现

https://github.com/PParkJy/SAD-DNAstorage (10.5281/zenodo.15571858) 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9adb/12187058/28ca9f293bce/btaf335f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验