Suppr超能文献

BdBG:一种基于桶的方法,用于使用动态德布鲁因图压缩基因组测序数据。

BdBG: a bucket-based method for compressing genome sequencing data with dynamic de Bruijn graphs.

作者信息

Wang Rongjie, Li Junyi, Bai Yang, Zang Tianyi, Wang Yadong

机构信息

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, HeiLongJiang, China.

School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, China.

出版信息

PeerJ. 2018 Oct 19;6:e5611. doi: 10.7717/peerj.5611. eCollection 2018.

Abstract

Dramatic increases in data produced by next-generation sequencing (NGS) technologies demand data compression tools for saving storage space. However, effective and efficient data compression for genome sequencing data has remained an unresolved challenge in NGS data studies. In this paper, we propose a novel alignment-free and reference-free compression method, BdBG, which is the first to compress genome sequencing data with dynamic de Bruijn graphs based on the data after bucketing. Compared with existing de Bruijn graph methods, BdBG only stored a list of bucket indexes and bifurcations for the raw read sequences, and this feature can effectively reduce storage space. Experimental results on several genome sequencing datasets show the effectiveness of BdBG over three state-of-the-art methods. BdBG is written in python and it is an open source software distributed under the MIT license, available for download at https://github.com/rongjiewang/BdBG.

摘要

下一代测序(NGS)技术产生的数据急剧增加,这就需要数据压缩工具来节省存储空间。然而,对基因组测序数据进行有效且高效的数据压缩在NGS数据研究中仍是一个未解决的挑战。在本文中,我们提出了一种新颖的无比对和无参考压缩方法BdBG,这是首个基于分桶后的数据,用动态德布鲁因图对基因组测序数据进行压缩的方法。与现有的德布鲁因图方法相比,BdBG仅存储原始读段序列的桶索引列表和分支,这一特性能够有效减少存储空间。在多个基因组测序数据集上的实验结果表明,BdBG优于三种最先进的方法。BdBG用Python编写,是根据麻省理工学院许可分发的开源软件,可在https://github.com/rongjiewang/BdBG下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0ff/6197042/7aede5e8cf42/peerj-06-5611-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验