Informatics and Digital Solutions, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK.
Bioinformatics. 2022 Mar 4;38(6):1497-1503. doi: 10.1093/bioinformatics/btac010.
CRAM has established itself as a high compression alternative to the BAM file format for DNA sequencing data. We describe updates to further improve this on modern sequencing instruments.
With Illumina data CRAM 3.1 is 7-15% smaller than the equivalent CRAM 3.0 file, and 50-70% smaller than the corresponding BAM file. Long-read technology shows more modest compression due to the presence of high-entropy signals.
The CRAM 3.0 specification is freely available from https://samtools.github.io/hts-specs/CRAMv3.pdf. The CRAM 3.1 improvements are available in a separate OpenSource HTScodecs library from https://github.com/samtools/htscodecs, and have been incorporated into HTSlib.
Supplementary data are available at Bioinformatics online.
CRAM 已经成为 DNA 测序数据的 BAM 文件格式的一种高效压缩替代方案。我们描述了进一步改进现代测序仪器上这种方案的更新。
对于 Illumina 数据,CRAM 3.1 比等效的 CRAM 3.0 文件小 7-15%,比相应的 BAM 文件小 50-70%。由于存在高熵信号,长读技术的压缩效果要适度一些。
CRAM 3.0 规范可从 https://samtools.github.io/hts-specs/CRAMv3.pdf 免费获得。CRAM 3.1 的改进可从 https://github.com/samtools/htscodecs 的单独开源 HTScodecs 库获得,并已被纳入 HTSlib。
补充数据可在 Bioinformatics 在线获得。