Albuquerque Marco A, Grande Bruno M, Ritch Elie J, Pararajalingam Prasath, Jessa Selin, Krzywinski Martin, Grewal Jasleen K, Shah Sohrab P, Boutros Paul C, Morin Ryan D
Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada.
Canada's Michael Smith Genome Sciences Center, BC Cancer Agency, Vancouver, BC, Canada.
Gigascience. 2017 May 1;6(5):1-13. doi: 10.1093/gigascience/gix015.
The field of cancer genomics has demonstrated the power of massively parallel sequencing techniques to inform on the genes and specific alterations that drive tumor onset and progression. Although large comprehensive sequence data sets continue to be made increasingly available, data analysis remains an ongoing challenge, particularly for laboratories lacking dedicated resources and bioinformatics expertise. To address this, we have produced a collection of Galaxy tools that represent many popular algorithms for detecting somatic genetic alterations from cancer genome and exome data. We developed new methods for parallelization of these tools within Galaxy to accelerate runtime and have demonstrated their usability and summarized their runtimes on multiple cloud service providers. Some tools represent extensions or refinement of existing toolkits to yield visualizations suited to cohort-wide cancer genomic analysis. For example, we present Oncocircos and Oncoprintplus, which generate data-rich summaries of exome-derived somatic mutation. Workflows that integrate these to achieve data integration and visualizations are demonstrated on a cohort of 96 diffuse large B-cell lymphomas and enabled the discovery of multiple candidate lymphoma-related genes. Our toolkit is available from our GitHub repository as Galaxy tool and dependency definitions and has been deployed using virtualization on multiple platforms including Docker.
癌症基因组学领域已证明大规模平行测序技术在揭示驱动肿瘤发生和发展的基因及特定改变方面的强大作用。尽管大型综合序列数据集越来越容易获取,但数据分析仍然是一项持续的挑战,尤其是对于缺乏专门资源和生物信息学专业知识的实验室而言。为了解决这一问题,我们制作了一系列Galaxy工具,这些工具代表了许多用于从癌症基因组和外显子组数据中检测体细胞遗传改变的常用算法。我们开发了在Galaxy中并行化这些工具的新方法以加快运行时间,并展示了它们的可用性,并总结了它们在多个云服务提供商上的运行时间。一些工具代表了现有工具包的扩展或改进,以生成适用于全队列癌症基因组分析的可视化结果。例如,我们展示了Oncocircos和Oncoprintplus,它们能生成源自外显子组的体细胞突变的丰富数据总结。在96例弥漫性大B细胞淋巴瘤队列中展示了整合这些工具以实现数据整合和可视化的工作流程,并发现了多个候选淋巴瘤相关基因。我们的工具包可从我们的GitHub仓库获取,包括Galaxy工具和依赖项定义,并已通过虚拟化在包括Docker在内的多个平台上进行了部署。