Sardaraz Muhammad, Tahir Muhammad, Ikram Ataul Aziz
1 Department of Computer Science, University of Wah, Quaid Avenue, Wah Cantt 47040, Pakistan.
2 Department of Electrical Engineering, National University, Islamabad 44000, Pakistan.
J Bioinform Comput Biol. 2016 Jun;14(3):1630002. doi: 10.1142/S0219720016300021. Epub 2015 Dec 20.
Advances in high throughput sequencing technologies and reduction in cost of sequencing have led to exponential growth in high throughput DNA sequence data. This growth has posed challenges such as storage, retrieval, and transmission of sequencing data. Data compression is used to cope with these challenges. Various methods have been developed to compress genomic and sequencing data. In this article, we present a comprehensive review of compression methods for genome and reads compression. Algorithms are categorized as referential or reference free. Experimental results and comparative analysis of various methods for data compression are presented. Finally, key challenges and research directions in DNA sequence data compression are highlighted.
高通量测序技术的进步以及测序成本的降低导致了高通量DNA序列数据呈指数级增长。这种增长带来了诸如测序数据的存储、检索和传输等挑战。数据压缩被用于应对这些挑战。已经开发了各种方法来压缩基因组和测序数据。在本文中,我们对基因组和读段压缩的压缩方法进行了全面综述。算法被分为基于参考或无参考两类。给出了各种数据压缩方法的实验结果和比较分析。最后,强调了DNA序列数据压缩中的关键挑战和研究方向。