Computational Biology Department, Institut Pasteur, Université Paris Cité, F-75015 Paris, France.
Univ Rennes, Inria, CNRS, IRISA-UMR, 6074 Rennes, France.
Bioinformatics. 2022 Sep 15;38(18):4423-4425. doi: 10.1093/bioinformatics/btac528.
Bioinformatics applications increasingly rely on ad hoc disk storage of k-mer sets, e.g. for de Bruijn graphs or alignment indexes. Here, we introduce the K-mer File Format as a general lossless framework for storing and manipulating k-mer sets, realizing space savings of 3-5× compared to other formats, and bringing interoperability across tools.
Format specification, C++/Rust API, tools: https://github.com/Kmer-File-Format/.
Supplementary data are available at Bioinformatics online.
生物信息学应用越来越依赖于特殊的 k-mer 集磁盘存储,例如用于 de Bruijn 图或比对索引。在这里,我们引入了 K-mer 文件格式,作为一种通用的无损框架,用于存储和操作 k-mer 集,与其他格式相比,实现了 3-5 倍的空间节省,并实现了工具之间的互操作性。
格式规范、C++/Rust API、工具:https://github.com/Kmer-File-Format/。
补充数据可在 Bioinformatics 在线获取。