Geng Zhuangzhuang, Wafula Eric, Corbett Ryan J, Zhang Yuanchao, Jin Run, Gaonkar Krutika S, Shukla Sangeeta, Rathi Komal S, Hill Dave, Lahiri Aditya, Miller Daniel P, Sickler Alex, Keith Kelsey, Blackden Christopher, Chroni Antonia, Brown Miguel A, Kraya Adam A, Clark Kaylyn L, Rood Brian R, Resnick Adam C, Van Kuren Nicholas, Maris John M, Farrel Alvin, Koptyra Mateusz P, Trooskin Gerri R, Coleman Noel, Zhu Yuankun, Stefankiewicz Stephanie, Abdullaev Zied, Chinwalla Asif T, Santi Mariarita, Naqvi Ammar S, Mason Jennifer L, Koschmann Carl J, Huang Xiaoyan, Diskin Sharon J, Aldape Kenneth, Farrow Bailey K, Ma Weiping, Zhang Bo, Ennis Brian M, Tasian Sarah, Phul Saksham, Lueder Matthew R, Zhong Chuwei, Dybas Joseph M, Wang Pei, Taylor Deanne, Rokita Jo Lynne
Center for Data-Driven Discovery in Biomedicine, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf093.
In 2019, the Open Pediatric Brain Tumor Atlas (OpenPBTA) was created as a global, collaborative open-science initiative to genomically characterize 1,074 pediatric brain tumors and 22 patient-derived cell lines. Here, we present an extension of the OpenPBTA called the Open Pediatric Cancer (OpenPedCan) Project, a harmonized open-source multiomic dataset from 6,112 pediatric cancer patients with 7,096 tumor events across more than 100 histologies. Combined with RNA sequencing (RNA-seq) from the Genotype-Tissue Expression and The Cancer Genome Atlas projects, OpenPedCan contains nearly 48,000 total biospecimens (24,002 tumor and 23,893 normal specimens).
We utilized Gabriella Miller Kids First workflows to harmonize whole-genome sequencing (WGS), whole exome sequencing (WXS), RNA-seq, and Targeted Sequencing datasets to include somatic SNVs, indels, copy number variants, structural variants, RNA expression, fusions, and splice variants. We integrated summarized Clinical Proteomic Tumor Analysis Consortium whole-cell proteomics and phospho-proteomics data and miRNA sequencing data, as well as developed a methylation array harmonization workflow to include m-values, beta-values, and copy number calls. OpenPedCan contains reproducible, dockerized workflows in GitHub, CAVATICA, and Amazon Web Services (AWS) to deliver harmonized and processed data from over 60 scalable modules, which can be leveraged both locally and on AWS. The processed data are released in a versioned manner and accessible through CAVATICA or AWS S3 download (from GitHub) and queryable through PedcBioPortal and the National Cancer Institute's pediatric Molecular Targets Platform. Notably, we have expanded Pediatric Brain Tumor Atlas molecular subtyping to include methylation information to align with the World Health Organization 2021 Central Nervous System Tumor classifications, allowing us to create research-grade integrated diagnoses for these tumors.
OpenPedCan data and its reproducible analysis module framework are openly available and can be utilized and/or adapted by researchers to accelerate discovery, validation, and clinical translation.
2019年,开放儿科脑肿瘤图谱(OpenPBTA)作为一项全球合作的开放科学计划而创建,旨在对1074例儿科脑肿瘤和22个患者来源的细胞系进行基因组特征分析。在此,我们介绍OpenPBTA的一个扩展项目,即开放儿科癌症(OpenPedCan)项目,这是一个来自6112例儿科癌症患者的统一开源多组学数据集,涵盖100多种组织学类型的7096个肿瘤事件。结合来自基因型-组织表达项目和癌症基因组图谱项目的RNA测序(RNA-seq)数据,OpenPedCan总共包含近48000个生物样本(24002个肿瘤样本和23893个正常样本)。
我们利用加布里埃拉·米勒儿童优先工作流程来整合全基因组测序(WGS)、全外显子组测序(WXS)、RNA-seq和靶向测序数据集,以纳入体细胞单核苷酸变异(SNV)、插入缺失、拷贝数变异、结构变异、RNA表达、融合和剪接变异。我们整合了汇总的临床蛋白质组肿瘤分析联盟全细胞蛋白质组学和磷酸化蛋白质组学数据以及微小RNA测序数据,并开发了一种甲基化阵列整合工作流程,以纳入m值、β值和拷贝数调用。OpenPedCan在GitHub、CAVATICA和亚马逊网络服务(AWS)中包含可重现的、基于容器的工作流程,以提供来自60多个可扩展模块的统一和经过处理的数据,这些数据可在本地和AWS上使用。处理后的数据以版本化方式发布,可通过CAVATICA或AWS S3下载(来自GitHub)获取,并可通过PedcBioPortal和美国国立癌症研究所的儿科分子靶点平台进行查询。值得注意的是,我们扩展了儿科脑肿瘤图谱分子亚型分类,纳入甲基化信息,以与世界卫生组织2021年中枢神经系统肿瘤分类保持一致,从而能够为这些肿瘤创建研究级别的综合诊断。
OpenPedCan数据及其可重现的分析模块框架是公开可用的,研究人员可以利用和/或改编这些数据,以加速发现、验证和临床转化。