Suppr超能文献

Rainbow:一种使用云计算进行大规模全基因组测序数据分析的工具。

Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing.

机构信息

Systems Pharmacology and Biomarkers, Janssen Research & Development, LLC, 3210 Merryfield Row, San Diego, CA 92121, USA.

出版信息

BMC Genomics. 2013 Jun 27;14:425. doi: 10.1186/1471-2164-14-425.

Abstract

BACKGROUND

Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses.

RESULTS

Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies.

CONCLUSIONS

Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of the box. Rainbow is available for third-party implementation and use, and can be downloaded from http://s3.amazonaws.com/jnj_rainbow/index.html.

摘要

背景

技术的进步降低了测序成本,因此基因组数据集的规模和数量迅速增加。由于成本较低,现在越来越多的中小规模研究小组正在产生大量的序列数据。Crossbow 是一种软件工具,可以检测来自单个个体的全基因组测序 (WGS) 数据中的单核苷酸多态性 (SNP);然而,当应用于来自大型 WGS 项目的多个个体时,Crossbow 存在许多限制。对于许多核心设施和单个实验室来说,进行大规模全基因组测序数据分析所需的数据存储和 CPU 资源太大。为了帮助应对这些挑战,我们开发了 Rainbow,这是一个基于云的软件包,可以协助进行大规模 WGS 数据分析的自动化。

结果

在这里,我们通过分析 44 个不同的全基因组测序个体来评估 Rainbow 的性能。Rainbow 能够在两周内使用 Amazon Web Service 提供的云计算处理来自 500 多个个体的基因组数据。这包括使用 Amazon Import/Export service 导入和导出数据的时间。在云中处理单个样本的平均成本低于 120 美元。与 Crossbow 相比,Rainbow 主要的改进包括以下功能:(1) 处理 BAM 和 FASTQ 输入文件;(2) 分割大型序列文件以更好地平衡下游负载;(3) 在数据处理过程中记录运行指标,并监控多个 Amazon Elastic Compute Cloud (EC2) 实例;(4) 将多个个体的 SOAPsnp 输出合并到单个文件中,以方便下游的全基因组关联研究。

结论

Rainbow 是一种用于大规模 WGS 数据分析的可扩展、经济高效且开源的工具。对于 Illumina HiSeq 2000 或 HiSeq 2500 平台测序的人类 WGS 数据,Rainbow 可以直接使用。Rainbow 可供第三方实现和使用,可从 http://s3.amazonaws.com/jnj_rainbow/index.html 下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c1cd/3698007/a85f932c6a4b/1471-2164-14-425-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验