School of artificial intelligence, Xidian University, Xian, 710071, China.
The Pengcheng Lab, Shenzhen, 518055, China.
BMC Bioinformatics. 2022 Jun 7;23(1):219. doi: 10.1186/s12859-022-04712-z.
With the rapid development of high-throughput sequencing technology, the cost of whole genome sequencing drops rapidly, which leads to an exponential growth of genome data. How to efficiently compress the DNA data generated by large-scale genome projects has become an important factor restricting the further development of the DNA sequencing industry. Although the compression of DNA bases has achieved significant improvement in recent years, the compression of quality score is still challenging.
In this paper, by reinvestigating the inherent correlations between the quality score and the sequencing process, we propose a novel lossless quality score compressor based on adaptive coding order (ACO). The main objective of ACO is to traverse the quality score adaptively in the most correlative trajectory according to the sequencing process. By cooperating with the adaptive arithmetic coding and an improved in-context strategy, ACO achieves the state-of-the-art quality score compression performances with moderate complexity for the next-generation sequencing (NGS) data.
The competence enables ACO to serve as a candidate tool for quality score compression, ACO has been employed by AVS(Audio Video coding Standard Workgroup of China) and is freely available at https://github.com/Yoniming/ACO.
随着高通量测序技术的快速发展,全基因组测序的成本迅速下降,导致基因组数据呈指数级增长。如何有效地压缩大规模基因组项目产生的 DNA 数据,已成为限制 DNA 测序行业进一步发展的重要因素。尽管近年来 DNA 碱基的压缩已经取得了显著的进展,但质量分数的压缩仍然具有挑战性。
在本文中,通过重新研究质量分数与测序过程之间的固有相关性,我们提出了一种基于自适应编码顺序(ACO)的新型无损质量分数压缩器。ACO 的主要目标是根据测序过程自适应地在最相关的轨迹上遍历质量分数。通过与自适应算术编码和改进的上下文策略合作,ACO 为下一代测序(NGS)数据实现了最先进的质量分数压缩性能,同时具有适度的复杂性。
该能力使 ACO 能够成为质量分数压缩的候选工具,ACO 已被 AVS(中国音视频编解码标准工作组)采用,并可在 https://github.com/Yoniming/ACO 上免费获得。