Huang Yizhou Peter, Harmon Lauren, Deering-Gardner Eve, Ma Xiaotu, Harsh Josiah, Xue Zhaoyu, Wen Hong, Ramos Marcel, Davis Sean, Triche Timothy J
Michigan State University, East Lansing, MI, US.
Van Andel Institute, Grand Rapids, MI, US.
bioRxiv. 2024 Nov 27:2023.09.15.558026. doi: 10.1101/2023.09.15.558026.
The NCI Genomic Data Commons (GDC) provides controlled access to sequencing data from thousands of subjects, enabling large-scale study of impactful genetic alterations such as simple and complex germline and structural variants. However, efficient analysis requires significant computational resources and expertise, especially when recalling variants from raw sequence reads. We thus developed , an R/Bioconductor package that builds upon the package to extract aligned sequence reads from cross-GDC meta-cohorts, followed by targeted analysis of variants and effects (including transcript-aware variant annotation from transcriptome-aligned GDC RNA data). Here we demonstrate population-scale genomic & transcriptomic analyses with minimal compute burden via , identifying recurrent, clinically relevant sequence and structural variants in the TARGET AML and BEAT-AML cohorts. We then validate results in the (non-GDC) Leucegene cohort, demonstrating how the pipeline can be seamlessly applied to replicate findings in non-GDC cohorts. These variants directly yield clinically impactful and biologically testable hypotheses for mechanistic investigation. has been submitted to the Bioconductor project, where it is presently under review, and is available on GitHub at https://github.com/trichelab/bamSliceR.
美国国立癌症研究所基因组数据共享库(GDC)提供对数千名受试者测序数据的受控访问,从而能够对诸如简单和复杂的种系及结构变异等有影响力的基因改变进行大规模研究。然而,高效分析需要大量的计算资源和专业知识,尤其是从原始序列读数中召回变异时。因此,我们开发了bamSliceR,这是一个基于R/Bioconductor的软件包,它在SummarizedExperiment软件包的基础上进行构建,用于从跨GDC元队列中提取比对后的序列读数,随后对变异及其影响进行靶向分析(包括从与转录组比对的GDC RNA数据中进行转录本感知的变异注释)。在这里,我们展示了通过bamSliceR以最小的计算负担进行群体规模的基因组和转录组分析,在TARGET AML和BEAT-AML队列中识别出复发性、临床相关的序列和结构变异。然后我们在(非GDC)Leucegene队列中验证结果,展示了bamSliceR流程如何能够无缝应用于在非GDC队列中重复研究结果。这些变异直接产生了对机制研究具有临床影响力且可进行生物学检验的假设。bamSliceR已提交给Bioconductor项目,目前正在审核中,可在GitHub上获取,网址为https://github.com/trichelab/bamSliceR。