• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

构建用于可扩展数据分析的便携式和可重现癌症信息学工作流程:RNA测序教程。

Building Portable and Reproducible Cancer Informatics Workflows for Scalable Data Analysis: An RNA Sequencing Tutorial.

作者信息

Beck Rowan F, Worman Zelia F, Kaushik Gaurav, Davis-Dusenbery Brandi N

机构信息

Velsera, Charlestown, MA, USA.

ScienceIO, New York, NY, USA.

出版信息

Methods Mol Biol. 2025;2932:47-73. doi: 10.1007/978-1-0716-4566-6_2.

DOI:10.1007/978-1-0716-4566-6_2
PMID:40779103
Abstract

The continued decrease in sequencing costs has led to an abundance of high-throughput data representing an increasing diversity of experimental conditions. These changes have been coupled with the adoption of cloud technologies and interoperability standards to share and analyze large primary and secondary data files. While 10 years ago analysis of hundreds or thousands of genomics samples was only practical at institutions with large local computational resources, these experiments can now be routinely performed by anyone with access to the Internet.In this tutorial, we use the Seven Bridges Cancer Genomics Cloud (CGC) to analyze RNA sequencing data from the NIH Cancer Research Data Commons (CRDC). This tutorial demonstrates how to bring a new computational algorithm to the platform, combine it with an existing workflow, and execute an analysis on the cloud. We highlight best practices for designing command line tools, Docker containers, and CWL descriptions to enable massively parallelized and reproducible biomedical computation with cloud resources. The CGC's support for diverse analysis techniques and user-friendly interface simplifies the complex process of handling large datasets while promoting collaboration across disciplines.

摘要

测序成本的持续下降催生了大量高通量数据,这些数据代表着日益多样的实验条件。这些变化伴随着云技术的采用以及互操作性标准的应用,以共享和分析大型的原始和二级数据文件。十年前,只有拥有大量本地计算资源的机构才能实际分析数百或数千个基因组样本,而现在,任何能上网的人都可以常规地进行这些实验。在本教程中,我们使用七桥癌症基因组学云(CGC)来分析来自美国国立卫生研究院癌症研究数据共享库(CRDC)的RNA测序数据。本教程展示了如何将一种新的计算算法引入该平台,将其与现有的工作流程相结合,并在云端执行分析。我们强调了设计命令行工具、Docker容器和CWL描述的最佳实践,以利用云资源实现大规模并行化和可重复的生物医学计算。CGC对多种分析技术的支持和用户友好的界面简化了处理大型数据集的复杂过程,同时促进了跨学科合作。

相似文献

1
Building Portable and Reproducible Cancer Informatics Workflows for Scalable Data Analysis: An RNA Sequencing Tutorial.构建用于可扩展数据分析的便携式和可重现癌症信息学工作流程:RNA测序教程。
Methods Mol Biol. 2025;2932:47-73. doi: 10.1007/978-1-0716-4566-6_2.
2
Building Portable and Reproducible Cancer Informatics Workflows: An RNA Sequencing Case Study.构建便携式和可重复的癌症信息学工作流程:一个RNA测序案例研究。
Methods Mol Biol. 2019;1878:39-64. doi: 10.1007/978-1-4939-8868-6_2.
3
Cloud-based introduction to BASH programming for biologists.基于云的生物学 BASH 编程入门。
Brief Bioinform. 2024 Jul 23;25(Supplement_1). doi: 10.1093/bib/bbae244.
4
GRAPEVNE - Graphical Analytical Pipeline Development Environment for Infectious Diseases.GRAPEVNE - 传染病图形分析管道开发环境
Wellcome Open Res. 2025 May 27;10:279. doi: 10.12688/wellcomeopenres.23824.1. eCollection 2025.
5
VDJServer: A Cloud-Based Analysis Portal and Data Commons for Immune Repertoire Sequences and Rearrangements.VDJServer:一个基于云的免疫受体序列和重排分析门户和数据公共库。
Front Immunol. 2018 May 8;9:976. doi: 10.3389/fimmu.2018.00976. eCollection 2018.
6
Playbook workflow builder: Interactive construction of bioinformatics workflows.剧本工作流程构建器:生物信息学工作流程的交互式构建
PLoS Comput Biol. 2025 Apr 3;21(4):e1012901. doi: 10.1371/journal.pcbi.1012901. eCollection 2025 Apr.
7
Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.液体活检能否通过低深度全基因组测序检测肉瘤患者的循环肿瘤DNA?一项初步评估。
Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21.
8
Cloud-based serverless computing enables accelerated monte carlo simulations for nuclear medicine imaging.基于云的无服务器计算可实现核医学成像的加速蒙特卡罗模拟。
Biomed Phys Eng Express. 2024 Jun 25;10(4). doi: 10.1088/2057-1976/ad5847.
9
SAKit: An all-in-one analysis pipeline for identifying novel proteins resulting from variant events at both large and small scales.SAKit:一种用于鉴定由大尺度和小尺度变异事件产生的新型蛋白质的一体化分析管道。
J Bioinform Comput Biol. 2024 Oct;22(5):2450022. doi: 10.1142/S0219720024500227. Epub 2024 Oct 1.
10
clonevdjseq: A workflow and bioinformatics management system for sequencing, archiving, and analysis of VDJ sequences from clonal libraries.克隆VDJ序列分析:一种用于克隆文库中VDJ序列测序、存档和分析的工作流程及生物信息学管理系统。
BMC Bioinformatics. 2025 Jul 21;26(1):186. doi: 10.1186/s12859-025-06107-2.

本文引用的文献

1
NCI Cancer Research Data Commons: Lessons Learned and Future State.NCI 癌症研究数据共享:经验教训和未来发展方向。
Cancer Res. 2024 May 2;84(9):1404-1409. doi: 10.1158/0008-5472.CAN-23-2730.
2
NCI Cancer Research Data Commons: Resources to Share Key Cancer Data.NCI 癌症研究数据共享社区:分享关键癌症数据的资源。
Cancer Res. 2024 May 2;84(9):1388-1395. doi: 10.1158/0008-5472.CAN-23-2468.
3
NCI Cancer Research Data Commons: Cloud-Based Analytic Resources.NCI 癌症研究数据共享:基于云的分析资源。
Cancer Res. 2024 May 2;84(9):1396-1403. doi: 10.1158/0008-5472.CAN-23-2657.
4
The five pillars of computational reproducibility: bioinformatics and beyond.计算可重复性的五个支柱:生物信息学及其他。
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad375.
5
A large-scale study on research code quality and execution.一项关于研究代码质量和执行情况的大规模研究。
Sci Data. 2022 Feb 21;9(1):60. doi: 10.1038/s41597-022-01143-6.
6
BCO App: tools for generating BioCompute Objects from next-generation sequencing workflows and computations.BCO App:用于从下一代测序工作流程和计算中生成生物计算对象的工具。
F1000Res. 2020 Sep 16;9:1144. doi: 10.12688/f1000research.25902.1. eCollection 2020.
7
The Human Tumor Atlas Network: Charting Tumor Transitions across Space and Time at Single-Cell Resolution.人类肿瘤图谱网络:以单细胞分辨率绘制肿瘤在空间和时间上的转变图谱。
Cell. 2020 Apr 16;181(2):236-249. doi: 10.1016/j.cell.2020.03.053.
8
Addendum: The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity.附录:癌细胞系百科全书可实现抗癌药物敏感性的预测建模。
Nature. 2019 Jan;565(7738):E5-E6. doi: 10.1038/s41586-018-0722-x.
9
The Cancer Genome Atlas: Creating Lasting Value beyond Its Data.癌症基因组图谱:在其数据之外创造持久价值。
Cell. 2018 Apr 5;173(2):283-285. doi: 10.1016/j.cell.2018.03.042.
10
The Cancer Genomics Cloud: Collaborative, Reproducible, and Democratized-A New Paradigm in Large-Scale Computational Research.癌症基因组学云:协作、可重复且民主化——大规模计算研究的新范式
Cancer Res. 2017 Nov 1;77(21):e3-e6. doi: 10.1158/0008-5472.CAN-17-0387.