Beck Rowan F, Worman Zelia F, Kaushik Gaurav, Davis-Dusenbery Brandi N
Velsera, Charlestown, MA, USA.
ScienceIO, New York, NY, USA.
Methods Mol Biol. 2025;2932:47-73. doi: 10.1007/978-1-0716-4566-6_2.
The continued decrease in sequencing costs has led to an abundance of high-throughput data representing an increasing diversity of experimental conditions. These changes have been coupled with the adoption of cloud technologies and interoperability standards to share and analyze large primary and secondary data files. While 10 years ago analysis of hundreds or thousands of genomics samples was only practical at institutions with large local computational resources, these experiments can now be routinely performed by anyone with access to the Internet.In this tutorial, we use the Seven Bridges Cancer Genomics Cloud (CGC) to analyze RNA sequencing data from the NIH Cancer Research Data Commons (CRDC). This tutorial demonstrates how to bring a new computational algorithm to the platform, combine it with an existing workflow, and execute an analysis on the cloud. We highlight best practices for designing command line tools, Docker containers, and CWL descriptions to enable massively parallelized and reproducible biomedical computation with cloud resources. The CGC's support for diverse analysis techniques and user-friendly interface simplifies the complex process of handling large datasets while promoting collaboration across disciplines.
测序成本的持续下降催生了大量高通量数据,这些数据代表着日益多样的实验条件。这些变化伴随着云技术的采用以及互操作性标准的应用,以共享和分析大型的原始和二级数据文件。十年前,只有拥有大量本地计算资源的机构才能实际分析数百或数千个基因组样本,而现在,任何能上网的人都可以常规地进行这些实验。在本教程中,我们使用七桥癌症基因组学云(CGC)来分析来自美国国立卫生研究院癌症研究数据共享库(CRDC)的RNA测序数据。本教程展示了如何将一种新的计算算法引入该平台,将其与现有的工作流程相结合,并在云端执行分析。我们强调了设计命令行工具、Docker容器和CWL描述的最佳实践,以利用云资源实现大规模并行化和可重复的生物医学计算。CGC对多种分析技术的支持和用户友好的界面简化了处理大型数据集的复杂过程,同时促进了跨学科合作。