Suppr超能文献

DistMap:一个在 Hadoop 集群上进行分布式短读映射的工具包。

DistMap: a toolkit for distributed short read mapping on a Hadoop cluster.

机构信息

Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria.

出版信息

PLoS One. 2013 Aug 23;8(8):e72614. doi: 10.1371/journal.pone.0072614. eCollection 2013.

Abstract

With the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliorate this bottleneck we present a new tool, DistMap - a modular, scalable and integrated workflow to map reads in the Hadoop distributed computing framework. DistMap is easy to use, currently supports nine different short read mapping tools and can be run on all Unix-based operating systems. It accepts reads in FASTQ format as input and provides mapped reads in a SAM/BAM format. DistMap supports both paired-end and single-end reads thereby allowing the mapping of read data produced by different sequencing platforms. DistMap is available from http://code.google.com/p/distmap/

摘要

随着下一代测序数据输出的快速稳定增长,短读段的映射已成为数据分析的主要瓶颈。在单台计算机上,映射单个 Illumina HiSeq 泳道产生的大量读段可能需要数天时间。为了缓解这一瓶颈,我们提出了一种新工具,即 DistMap——一种在 Hadoop 分布式计算框架中映射读段的模块化、可扩展和集成的工作流程。DistMap 易于使用,目前支持九种不同的短读段映射工具,可在所有基于 Unix 的操作系统上运行。它接受 FASTQ 格式的读段作为输入,并以 SAM/BAM 格式提供映射后的读段。DistMap 同时支持双端和单端读段,从而允许映射来自不同测序平台的读段数据。DistMap 可从 http://code.google.com/p/distmap/ 获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d75/3751911/f64ad095bf6d/pone.0072614.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验