Suppr超能文献

MetaSpark:一种基于 Spark 的分布式处理工具,用于将宏基因组读取数据招募到参考基因组中。

MetaSpark: a spark-based distributed processing tool to recruit metagenomic reads to reference genomes.

机构信息

School of Software, Yunnan University, Kunming, China.

Computer Network Information Center of Chinese Academy of Sciences, Beijing, China.

出版信息

Bioinformatics. 2017 Apr 1;33(7):1090-1092. doi: 10.1093/bioinformatics/btw750.

Abstract

SUMMARY

With the advent of next-generation sequencing, traditional bioinformatics tools are challenged by massive raw metagenomic datasets. One of the bottlenecks of metagenomic studies is lack of large-scale and cloud computing suitable data analysis tools. In this paper, we proposed a Spark based tool, called MetaSpark, to recruit metagenomic reads to reference genomes. MetaSpark benefits from the distributed data set (RDD) of Spark, which makes it able to cache data set in memory across cluster nodes and scale well with the datasets. Compared with previous metagenomics recruitment tools, MetaSpark recruited significantly more reads than many programs such as SOAP2, BWA and LAST and increased recruited reads by ∼4% compared with FR-HIT when there were 1 million reads and 0.75 GB references. Different test cases demonstrate MetaSpark's scalability and overall high performance.

AVAILABILITY

https://github.com/zhouweiyg/metaspark.

CONTACT

bniu@sccas.cn , jingluo@ynu.edu.cn.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

摘要

随着下一代测序技术的出现,传统的生物信息学工具在海量的原始宏基因组数据集面前受到了挑战。宏基因组研究的一个瓶颈是缺乏大规模的、适用于云计算的数据分析工具。在本文中,我们提出了一个基于 Spark 的工具,称为 MetaSpark,用于将宏基因组读取数据招募到参考基因组中。MetaSpark 受益于 Spark 的分布式数据集(RDD),这使得它能够在跨集群节点的内存中缓存数据集,并能很好地扩展数据集。与以前的宏基因组招募工具相比,MetaSpark 招募到的读取数据比 SOAP2、BWA 和 LAST 等许多程序都多,并且在有 100 万读取数据和 0.75GB 参考数据时,与 FR-HIT 相比,招募到的读取数据增加了约 4%。不同的测试案例证明了 MetaSpark 的可扩展性和整体高性能。

可用性

https://github.com/zhouweiyg/metaspark。

联系方式

bniu@sccas.cn, jingluo@ynu.edu.cn

补充信息

补充数据可在《生物信息学》在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验