Department of Genome Sciences, University of Washington, PO Box 355065, Seattle, WA 98195-5065, USA.
Bioinformatics. 2010 Jun 1;26(11):1458-9. doi: 10.1093/bioinformatics/btq164. Epub 2010 Apr 29.
We present a format for efficient storage of multiple tracks of numeric data anchored to a genome. The format allows fast random access to hundreds of gigabytes of data, while retaining a small disk space footprint. We have also developed utilities to load data into this format. We show that retrieving data from this format is more than 2900 times faster than a naive approach using wiggle files.
Reference implementation in Python and C components available at http://noble.gs.washington.edu/proj/genomedata/ under the GNU General Public License.
我们提出了一种高效存储锚定到基因组的多个数值数据轨道的格式。该格式允许对数百千兆字节的数据进行快速随机访问,同时保持较小的磁盘空间占用。我们还开发了用于将数据加载到此格式的实用程序。我们表明,从该格式中检索数据的速度比使用 wiggle 文件的简单方法快 2900 多倍。
在 http://noble.gs.washington.edu/proj/genomedata/ 下,可根据 GNU 通用公共许可证获得 Python 和 C 组件的参考实现。