在 NCI 基因组数据共享中心进行统一的基因组数据分析。

Uniform genomic data analysis in the NCI Genomic Data Commons.

机构信息

Center for Translational Data Science, University of Chicago, Chicago, IL, USA.

AbbVie Inc., Redwood City, CA, USA.

出版信息

Nat Commun. 2021 Feb 22;12(1):1226. doi: 10.1038/s41467-021-21254-9.

DOI:10.1038/s41467-021-21254-9

PMID:33619257

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7900240/

Abstract

The goal of the National Cancer Institute's (NCI's) Genomic Data Commons (GDC) is to provide the cancer research community with a data repository of uniformly processed genomic and associated clinical data that enables data sharing and collaborative analysis in the support of precision medicine. The initial GDC dataset include genomic, epigenomic, proteomic, clinical and other data from the NCI TCGA and TARGET programs. Data production for the GDC started in June, 2015 using an OpenStack-based private cloud. By June of 2016, the GDC had analyzed more than 50,000 raw sequencing data inputs, as well as multiple other data types. Using the latest human genome reference build GRCh38, the GDC generated a variety of data types from aligned reads to somatic mutations, gene expression, miRNA expression, DNA methylation status, and copy number variation. In this paper, we describe the pipelines and workflows used to process and harmonize the data in the GDC. The generated data, as well as the original input files from TCGA and TARGET, are available for download and exploratory analysis at the GDC Data Portal and Legacy Archive ( https://gdc.cancer.gov/ ).

摘要

美国国家癌症研究所（NCI）的基因组数据共享中心（GDC）的目标是为癌症研究界提供一个基因组和相关临床数据的统一处理数据库，支持数据共享和协作分析，以支持精准医学。最初的 GDC 数据集包括来自 NCI TCGA 和 TARGET 计划的基因组学、表观基因组学、蛋白质组学、临床和其他数据。GDC 的数据生产于 2015 年 6 月开始，使用基于 OpenStack 的私有云。到 2016 年 6 月，GDC 已经分析了超过 50000 个原始测序数据输入，以及多种其他数据类型。利用最新的人类基因组参考构建体 GRCh38，GDC 从对齐的读取到体细胞突变、基因表达、miRNA 表达、DNA 甲基化状态和拷贝数变异等各种数据类型。在本文中，我们描述了用于处理和协调 GDC 中数据的管道和工作流程。生成的数据以及 TCGA 和 TARGET 的原始输入文件可在 GDC 数据门户和传统档案（https://gdc.cancer.gov/）上下载和进行探索性分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca52/7900240/4322b7f7b509/41467_2021_21254_Fig1_HTML.jpg

相似文献

Uniform genomic data analysis in the NCI Genomic Data Commons.

Nat Commun. 2021 Feb 22;12(1):1226. doi: 10.1038/s41467-021-21254-9.

Harmonizing and integrating the NCI Genomic Data Commons through accessible, interactive, and cloud-enabled workflows.

PLoS One. 2025 Mar 4;20(3):e0318676. doi: 10.1371/journal.pone.0318676. eCollection 2025.

NCI's Proteomic Data Commons: A Cloud-Based Proteomics Repository Empowering Comprehensive Cancer Analysis through Cross-Referencing with Genomic and Imaging Data.

Cancer Res Commun. 2024 Sep 1;4(9):2480-2488. doi: 10.1158/2767-9764.CRC-24-0243.

Developing Cancer Informatics Applications and Tools Using the NCI Genomic Data Commons API.

Cancer Res. 2017 Nov 1;77(21):e15-e18. doi: 10.1158/0008-5472.CAN-17-0598.

Before and After: Comparison of Legacy and Harmonized TCGA Genomic Data Commons' Data.

Cell Syst. 2019 Jul 24;9(1):24-34.e10. doi: 10.1016/j.cels.2019.06.006.

The NCI Genomic Data Commons as an engine for precision medicine.

Blood. 2017 Jul 27;130(4):453-459. doi: 10.1182/blood-2017-03-735654. Epub 2017 Jun 9.

An integrative investigation on significant mutations and their down-stream pathways in lung squamous cell carcinoma reveals CUL3/KEAP1/NRF2 relevant subtypes.

Mol Med. 2020 May 20;26(1):48. doi: 10.1186/s10020-020-00166-2.

SEQprocess: a modularized and customizable pipeline framework for NGS processing in R package.

BMC Bioinformatics. 2019 Feb 20;20(1):90. doi: 10.1186/s12859-019-2676-x.

TCGA Expedition: A Data Acquisition and Management System for TCGA Data.

PLoS One. 2016 Oct 27;11(10):e0165395. doi: 10.1371/journal.pone.0165395. eCollection 2016.

NCI Cancer Research Data Commons: Cloud-Based Analytic Resources.

Cancer Res. 2024 May 2;84(9):1396-1403. doi: 10.1158/0008-5472.CAN-23-2657.

引用本文的文献

Multimodal integration strategies for clinical application in oncology.

Front Pharmacol. 2025 Aug 20;16:1609079. doi: 10.3389/fphar.2025.1609079. eCollection 2025.

STRaM: A genetic framework for improved cell product provenance for research and clinical translations.

Commun Biol. 2025 Aug 15;8(1):1232. doi: 10.1038/s42003-025-08547-1.

Building digital histology models of transcriptional tumor programs with generative deep learning for pathology-based precision medicine.

Genome Med. 2025 Aug 7;17(1):87. doi: 10.1186/s13073-025-01502-z.

Cancer genomics and bioinformatics in Latin American countries: applications, challenges, and perspectives.

Front Oncol. 2025 Jul 9;15:1584178. doi: 10.3389/fonc.2025.1584178. eCollection 2025.

Genotyping from targeted NGS data based on a small set of SNPs correctly matches patient samples.

BMC Res Notes. 2025 Jul 2;18(1):270. doi: 10.1186/s13104-025-07348-3.

Identification of shared neoantigens derived from frameshift mutations in the gene.

Front Immunol. 2025 May 15;16:1574955. doi: 10.3389/fimmu.2025.1574955. eCollection 2025.

Training, Validating, and Testing Machine Learning Prediction Models for Endometrial Cancer Recurrence.

JCO Precis Oncol. 2025 May;9:e2400859. doi: 10.1200/PO-24-00859. Epub 2025 May 5.

Emerging artificial intelligence-driven precision therapies in tumor drug resistance: recent advances, opportunities, and challenges.

Mol Cancer. 2025 Apr 23;24(1):123. doi: 10.1186/s12943-025-02321-x.

Harmonizing and integrating the NCI Genomic Data Commons through accessible, interactive, and cloud-enabled workflows.

PLoS One. 2025 Mar 4;20(3):e0318676. doi: 10.1371/journal.pone.0318676. eCollection 2025.

Robust Cluster Prediction Across Data Types Validates Association of Sex and Therapy Response in GBM.

Cancers (Basel). 2025 Jan 28;17(3):445. doi: 10.3390/cancers17030445.

本文引用的文献

Before and After: Comparison of Legacy and Harmonized TCGA Genomic Data Commons' Data.

Cell Syst. 2019 Jul 24;9(1):24-34.e10. doi: 10.1016/j.cels.2019.06.006.

Improvements and impacts of GRCh38 human reference on high throughput sequencing data analysis.

Genomics. 2017 Mar;109(2):83-90. doi: 10.1016/j.ygeno.2017.01.005. Epub 2017 Jan 26.

Integrated genomic characterization of oesophageal carcinoma.

Nature. 2017 Jan 12;541(7636):169-175. doi: 10.1038/nature20805. Epub 2017 Jan 4.

Comprehensive characterization, annotation and innovative use of Infinium DNA methylation BeadChip probes.

Nucleic Acids Res. 2017 Feb 28;45(4):e22. doi: 10.1093/nar/gkw967.

In-depth comparison of somatic point mutation callers based on different tumor next-generation sequencing depth data.

Sci Rep. 2016 Nov 22;6:36540. doi: 10.1038/srep36540.

Toward a Shared Vision for Cancer Genomic Data.

N Engl J Med. 2016 Sep 22;375(12):1109-12. doi: 10.1056/NEJMp1607591.

MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data.

Genome Biol. 2016 Aug 24;17(1):178. doi: 10.1186/s13059-016-1029-6.

The Ensembl Variant Effect Predictor.

Genome Biol. 2016 Jun 6;17(1):122. doi: 10.1186/s13059-016-0974-4.

Pan-cancer subtyping in a 2D-map shows substructures that are driven by specific combinations of molecular characteristics.

Sci Rep. 2016 Apr 25;6:24949. doi: 10.1038/srep24949.

A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing.

Nat Commun. 2015 Dec 9;6:10001. doi: 10.1038/ncomms10001.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

在 NCI 基因组数据共享中心进行统一的基因组数据分析。

Uniform genomic data analysis in the NCI Genomic Data Commons.

机构信息

Center for Translational Data Science, University of Chicago, Chicago, IL, USA.

AbbVie Inc., Redwood City, CA, USA.

出版信息

Nat Commun. 2021 Feb 22;12(1):1226. doi: 10.1038/s41467-021-21254-9.

DOI:10.1038/s41467-021-21254-9

PMID:33619257

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7900240/

Abstract

摘要

在 NCI 基因组数据共享中心进行统一的基因组数据分析。

Uniform genomic data analysis in the NCI Genomic Data Commons.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

在 NCI 基因组数据共享中心进行统一的基因组数据分析。

Uniform genomic data analysis in the NCI Genomic Data Commons.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献