Huang Yizhou Peter, Harmon Lauren, Deering-Gardner Eve, Ma Xiaotu, Harsh Josiah, Xue Zhaoyu, Wen Hong, Ramos Marcel, Davis Sean, Triche Timothy J
Michigan State University, East Lansing, MI 48824, United States.
Department of Epigenetics, Van Andel Institute, Grand Rapids, MI 49503, United States.
Bioinform Adv. 2025 Apr 28;5(1):vbaf098. doi: 10.1093/bioadv/vbaf098. eCollection 2025.
MOTIVATION: The National Cancer Institute Genomic Data Commons (GDC) provides controlled access to sequencing data from thousands of subjects, enabling large-scale study of impactful genetic alterations such as simple and complex germline and structural variants. However, efficient analysis requires significant computational resources and expertise, especially when calling variants from raw sequence reads. To solve these problems, we developed , a R/bioconductor package that builds upon the package to extract aligned sequence reads from cross-GDC meta-cohorts, followed by targeted analysis of variants and effects (including transcript-aware variant annotation from transcriptome-aligned GDC RNA data). RESULTS: Here, we demonstrate population-scale genomic and transcriptomic analyses with minimal compute burden using , identifying recurrent, clinically relevant sequence, and structural variants in the TARGET acute myeloid leukemia (AML) and BEAT-AML cohorts. We then validate results in the (non-GDC) Leucegene cohort, demonstrating how the pipeline can be seamlessly applied to replicate findings in non-GDC cohorts. These variants directly yield clinically impactful and biologically testable hypotheses for mechanistic investigation. AVAILABILITY AND IMPLEMENTATION: has been submitted to the Bioconductor project, where it is presently under review, and is available on GitHub at https://github.com/trichelab/bamSliceR.
动机:美国国立癌症研究所基因组数据共享库(GDC)提供对数千名受试者测序数据的受控访问,从而能够对诸如简单和复杂的种系及结构变异等有影响力的基因改变进行大规模研究。然而,高效分析需要大量计算资源和专业知识,尤其是从原始序列读数中调用变异时。为了解决这些问题,我们开发了bamSliceR,这是一个基于GenomicAlignments包构建的R/生物导体包,用于从跨GDC元队列中提取比对后的序列读数,随后对变异及其影响进行靶向分析(包括从与转录组比对的GDC RNA数据中进行转录本感知变异注释)。 结果:在这里,我们展示了使用bamSliceR以最小计算负担进行群体规模的基因组和转录组分析,在TARGET急性髓系白血病(AML)和BEAT - AML队列中识别出复发性、临床相关的序列和结构变异。然后我们在(非GDC)Leucegene队列中验证结果,展示了bamSliceR管道如何能够无缝应用于在非GDC队列中复制研究结果。这些变异直接产生了对机制研究具有临床影响力且可进行生物学检验的假设。 可用性与实现:bamSliceR已提交给生物导体项目,目前正在审核中,可在GitHub上获取,网址为https://github.com/trichelab/bamSliceR 。
Bioinform Adv. 2025-4-28
Autism Adulthood. 2025-5-28
Cochrane Database Syst Rev. 2025-6-16
Cochrane Database Syst Rev. 2024-6-20
Cochrane Database Syst Rev. 2025-5-7
Autism Adulthood. 2025-5-28
Cochrane Database Syst Rev. 2025-6-20
Life Sci Alliance. 2019-8-19
Genome Res. 2018-10-19
Nature. 2018-10-17
Bioinformatics. 2018-9-15