Signal Processing Lab, IEETA/DETI, University of Aveiro, 3810-193 Aveiro, Portugal.
Nucleic Acids Res. 2012 Feb;40(4):e27. doi: 10.1093/nar/gkr1124. Epub 2011 Dec 1.
Research in the genomic sciences is confronted with the volume of sequencing and resequencing data increasing at a higher pace than that of data storage and communication resources, shifting a significant part of research budgets from the sequencing component of a project to the computational one. Hence, being able to efficiently store sequencing and resequencing data is a problem of paramount importance. In this article, we describe GReEn (Genome Resequencing Encoding), a tool for compressing genome resequencing data using a reference genome sequence. It overcomes some drawbacks of the recently proposed tool GRS, namely, the possibility of compressing sequences that cannot be handled by GRS, faster running times and compression gains of over 100-fold for some sequences. This tool is freely available for non-commercial use at ftp://ftp.ieeta.pt/~ap/codecs/GReEn1.tar.gz.
基因组科学研究面临的问题是,测序和重测序数据的增长速度高于数据存储和通信资源的增长速度,这使得项目的测序部分的研究预算的很大一部分转移到了计算部分。因此,能够有效地存储测序和重测序数据是一个至关重要的问题。在本文中,我们描述了 GReEn(基因组重测序编码),这是一种使用参考基因组序列压缩基因组重测序数据的工具。它克服了最近提出的工具 GRS 的一些缺点,即能够压缩 GRS 无法处理的序列,运行时间更快,并且对于某些序列,压缩增益超过 100 倍。这个工具是免费的,可在非商业用途使用,可在 ftp://ftp.ieeta.pt/~ap/codecs/GReEn1.tar.gz 下载。