Gurtowski James, Schatz Michael C, Langmead Ben
Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York.
Department of Computer Science, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland.
Curr Protoc Bioinformatics. 2012 Sep;Chapter 15:15.3.1-15.3.15. doi: 10.1002/0471250953.bi1503s39.
Crossbow is a scalable, portable, and automatic cloud computing tool for identifying SNPs from high-coverage, short-read resequencing data. It is built on Apache Hadoop, an implementation of the MapReduce software framework. Hadoop allows Crossbow to distribute read alignment and SNP calling subtasks over a cluster of commodity computers. Two robust tools, Bowtie and SOAPsnp, implement the fundamental alignment and variant calling operations respectively, and have demonstrated capabilities within Crossbow of analyzing approximately one billion short reads per hour on a commodity Hadoop cluster with 320 cores. Through protocol examples, this unit will demonstrate the use of Crossbow for identifying variations in three different operating modes: on a Hadoop cluster, on a single computer, and on the Amazon Elastic MapReduce cloud computing service.
Crossbow是一款可扩展、便携且自动化的云计算工具,用于从高覆盖度、短读长重测序数据中识别单核苷酸多态性(SNP)。它基于Apache Hadoop构建,后者是MapReduce软件框架的一种实现。Hadoop使Crossbow能够在一组商用计算机上分布式执行读段比对和SNP检测子任务。两个强大的工具Bowtie和SOAPsnp分别实现基本的比对和变异检测操作,并且在一个拥有320个核心的商用Hadoop集群上,已证明它们在Crossbow中具备每小时分析约10亿条短读段的能力。通过协议示例,本单元将演示如何在三种不同操作模式下使用Crossbow识别变异:在Hadoop集群上、在单台计算机上以及在亚马逊弹性MapReduce云计算服务上。