Suppr超能文献

一种 DNA 的压缩方法。

A compression method for DNA.

机构信息

College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, China.

出版信息

PLoS One. 2020 Nov 25;15(11):e0238220. doi: 10.1371/journal.pone.0238220. eCollection 2020.

Abstract

The development of high-throughput sequencing technology has generated huge amounts DNA data. Many general compression algorithms are not ideal for compressing DNA data, such as the LZ77 algorithm. On the basis of Nour and Sharawi's method,we propose a new, lossless and reference-free method to increase the compression performance. The original sequences are converted into eight intermediate files and six final files. Then, the LZ77 algorithm is used to compress the six final files. The results show that the compression time is decreased by 83% and the decompression time is decreased by 54% on average.The compression rate is almost the same as Nour and Sharawi's method which is the fastest method so far. What's more, our method has a wider range of application than Nour and Sharawi's method. Compared to some very advanced compression tools at present, such as XM and FCM-Mx, the time for compression in our method is much smaller, on average decreasing the time by more than 90%.

摘要

高通量测序技术的发展产生了大量的 DNA 数据。许多通用的压缩算法并不适合压缩 DNA 数据,例如 LZ77 算法。在 Nour 和 Sharawi 方法的基础上,我们提出了一种新的、无损且无参考的方法来提高压缩性能。原始序列被转换为八个中间文件和六个最终文件。然后,使用 LZ77 算法压缩六个最终文件。结果表明,平均压缩时间减少了 83%,解压时间减少了 54%。压缩率与迄今为止最快的 Nour 和 Sharawi 方法几乎相同。此外,我们的方法比 Nour 和 Sharawi 方法的应用范围更广。与目前一些非常先进的压缩工具,如 XM 和 FCM-Mx 相比,我们方法的压缩时间要小得多,平均减少了 90%以上。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7441/7688149/146d60a89fe1/pone.0238220.g001.jpg

相似文献

1
A compression method for DNA.
PLoS One. 2020 Nov 25;15(11):e0238220. doi: 10.1371/journal.pone.0238220. eCollection 2020.
2
Modified HuffBit Compress Algorithm - An Application of R.
J Integr Bioinform. 2018 Feb 22;15(3):20170057. doi: 10.1515/jib-2017-0057.
3
Compression of genomic sequencing reads via hash-based reordering: algorithm and analysis.
Bioinformatics. 2018 Feb 15;34(4):558-567. doi: 10.1093/bioinformatics/btx639.
4
FaStore: a space-saving solution for raw sequencing data.
Bioinformatics. 2018 Aug 15;34(16):2748-2756. doi: 10.1093/bioinformatics/bty205.
5
LFQC: a lossless compression algorithm for FASTQ files.
Bioinformatics. 2015 Oct 15;31(20):3276-81. doi: 10.1093/bioinformatics/btv384. Epub 2015 Jun 20.
6
smallWig: parallel compression of RNA-seq WIG files.
Bioinformatics. 2016 Jan 15;32(2):173-80. doi: 10.1093/bioinformatics/btv561. Epub 2015 Sep 30.
7
Advances in high throughput DNA sequence data compression.
J Bioinform Comput Biol. 2016 Jun;14(3):1630002. doi: 10.1142/S0219720016300021. Epub 2015 Dec 20.
8
Compression of next-generation sequencing quality scores using memetic algorithm.
BMC Bioinformatics. 2014;15 Suppl 15(Suppl 15):S10. doi: 10.1186/1471-2105-15-S15-S10. Epub 2014 Dec 3.
9
AFRESh: an adaptive framework for compression of reads and assembled sequences with random access functionality.
Bioinformatics. 2017 May 15;33(10):1464-1472. doi: 10.1093/bioinformatics/btx001.
10
ERGC: an efficient referential genome compression algorithm.
Bioinformatics. 2015 Nov 1;31(21):3468-75. doi: 10.1093/bioinformatics/btv399. Epub 2015 Jul 2.

本文引用的文献

1
Iterative dictionary construction for compression of large DNA data sets.
IEEE/ACM Trans Comput Biol Bioinform. 2012 Jan-Feb;9(1):137-49. doi: 10.1109/TCBB.2011.82. Epub 2011 Apr 27.
2
Patternhunter II: highly sensitive and fast homology search.
J Bioinform Comput Biol. 2004 Sep;2(3):417-39. doi: 10.1142/s0219720004000661.
3
DNACompress: fast and effective DNA sequence compression.
Bioinformatics. 2002 Dec;18(12):1696-8. doi: 10.1093/bioinformatics/18.12.1696.
4
Biological sequence compression algorithms.
Genome Inform Ser Workshop Genome Inform. 2000;11:43-52.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验