Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA.
Graduate School of Public Health and Health Policy, City University of New York, New York, NY.
JCO Clin Cancer Inform. 2020 May;4:472-479. doi: 10.1200/CCI.19.00111.
Institutional efforts toward the democratization of cloud-scale data and analysis methods for cancer genomics are proceeding rapidly. As part of this effort, we bridge two major bioinformatic initiatives: the Global Alliance for Genomics and Health (GA4GH) and Bioconductor.
We describe in detail a use case in pancancer transcriptomics conducted by blending implementations of the GA4GH Workflow Execution Services and Tool Registry Service concepts with the Bioconductor curatedTCGAData and BiocOncoTK packages.
We carried out the analysis with a formally archived workflow and container at dockstore.org and a workspace and notebook at app.terra.bio. The analysis identified relationships between microsatellite instability and biomarkers of immune dysregulation at a finer level of granularity than previously reported. Our use of standard approaches to containerization and workflow programming allows this analysis to be replicated and extended.
Experimental use of dockstore.org and app.terra.bio in concert with Bioconductor enabled novel statistical analysis of large genomic projects without the need for local supercomputing resources but involved challenges related to container design, script archiving, and unit testing. Best practices and cost/benefit metrics for the management and analysis of globally federated genomic data and annotation are evolving. The creation and execution of use cases like the one reported here will be helpful in the development and comparison of approaches to federated data/analysis systems in cancer genomics.
机构正在努力推动癌症基因组学数据和分析方法的民主化,使其在云计算规模上得以应用。在此过程中,我们整合了两个主要的生物信息学计划:基因组和健康全球联盟(GA4GH)和 Bioconductor。
我们详细描述了一个泛癌转录组学的用例,该用例通过融合 GA4GH 工作流执行服务和工具注册服务概念的实现,以及 Bioconductor curatedTCGAData 和 BiocOncoTK 包,来完成分析。
我们在 dockstore.org 上使用正式归档的工作流和容器,以及在 app.terra.bio 上使用工作区和笔记本进行了分析。分析结果在比以前报告的更细的粒度上确定了微卫星不稳定性与免疫失调生物标志物之间的关系。我们使用标准的容器化和工作流编程方法来进行分析,这使得分析可以被复制和扩展。
在 Bioconductor 的协同作用下,实验性地使用 dockstore.org 和 app.terra.bio,使得无需本地超级计算资源即可对大型基因组项目进行新的统计分析,但涉及到容器设计、脚本归档和单元测试等方面的挑战。全球联邦基因组数据和注释的管理和分析的最佳实践和成本/效益指标正在不断发展。创建和执行像这里报告的这样的用例,将有助于开发和比较癌症基因组学中联邦数据/分析系统的方法。