College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, China.
PLoS One. 2020 Nov 25;15(11):e0238220. doi: 10.1371/journal.pone.0238220. eCollection 2020.
The development of high-throughput sequencing technology has generated huge amounts DNA data. Many general compression algorithms are not ideal for compressing DNA data, such as the LZ77 algorithm. On the basis of Nour and Sharawi's method,we propose a new, lossless and reference-free method to increase the compression performance. The original sequences are converted into eight intermediate files and six final files. Then, the LZ77 algorithm is used to compress the six final files. The results show that the compression time is decreased by 83% and the decompression time is decreased by 54% on average.The compression rate is almost the same as Nour and Sharawi's method which is the fastest method so far. What's more, our method has a wider range of application than Nour and Sharawi's method. Compared to some very advanced compression tools at present, such as XM and FCM-Mx, the time for compression in our method is much smaller, on average decreasing the time by more than 90%.
高通量测序技术的发展产生了大量的 DNA 数据。许多通用的压缩算法并不适合压缩 DNA 数据,例如 LZ77 算法。在 Nour 和 Sharawi 方法的基础上,我们提出了一种新的、无损且无参考的方法来提高压缩性能。原始序列被转换为八个中间文件和六个最终文件。然后,使用 LZ77 算法压缩六个最终文件。结果表明,平均压缩时间减少了 83%,解压时间减少了 54%。压缩率与迄今为止最快的 Nour 和 Sharawi 方法几乎相同。此外,我们的方法比 Nour 和 Sharawi 方法的应用范围更广。与目前一些非常先进的压缩工具,如 XM 和 FCM-Mx 相比,我们方法的压缩时间要小得多,平均减少了 90%以上。