Division of Biostatistics, Medical College of Wisconsin, Milwaukee, Wisconsin, USA.
Department of Bioinformatics and Computational Biology, MD Anderson Cancer Center, Houston, Texas, USA.
Genet Epidemiol. 2023 Dec;47(8):617-636. doi: 10.1002/gepi.22537. Epub 2023 Oct 11.
Cancer is a disease driven by a combination of inherited genetic variants and somatic mutations. Recently available large-scale sequencing data of cancer genomes have provided an unprecedented opportunity to study the interactions between them. However, previous studies on this topic have been limited by simple, low statistical power tests such as Fisher's exact test. In this paper, we design data-adaptive and pathway-based tests based on the score statistic for association studies between somatic mutations and germline variations. Previous research has shown that two single-nucleotide polymorphism (SNP)-set-based association tests, adaptive sum of powered score (aSPU) and data-adaptive pathway-based (aSPUpath) tests, increase the power in genome-wide association studies (GWASs) with a single disease trait in a case-control study. We extend aSPU and aSPUpath to multi-traits, that is, somatic mutations of multiple genes in a cohort study, allowing extensive information aggregation at both SNP and gene levels. -values from different parameters assuming varying genetic architecture are combined to yield data-adaptive tests for somatic mutations and germline variations. Extensive simulations show that, in comparison with some commonly used methods, our data-adaptive somatic mutations/germline variations tests can be applied to multiple germline SNPs/genes/pathways, and generally have much higher statistical powers while maintaining the appropriate type I error. The proposed tests are applied to a large-scale real-world International Cancer Genome Consortium whole genome sequencing data set of 2583 subjects, detecting more significant and biologically relevant associations compared with the other existing methods on both gene and pathway levels. Our study has systematically identified the associations between various germline variations and somatic mutations across different cancer types, which potentially provides valuable utility for cancer risk prediction, prognosis, and therapeutics.
癌症是由遗传基因突变和体细胞突变共同作用驱动的疾病。最近可获得的癌症基因组大规模测序数据为研究它们之间的相互作用提供了前所未有的机会。然而,以前关于这个主题的研究受到了简单、统计功效低的测试方法的限制,例如 Fisher 精确检验。在本文中,我们设计了基于关联研究中体细胞突变和种系变异之间得分统计量的自适应数据和基于途径的测试。先前的研究表明,两种基于单核苷酸多态性(SNP)集的关联测试,自适应加和幂得分(aSPU)和基于数据自适应的途径(aSPUpath)测试,在病例对照研究中增加了具有单一疾病特征的全基因组关联研究(GWAS)的功效。我们将 aSPU 和 aSPUpath 扩展到多特征,即队列研究中多个基因的体细胞突变,允许在 SNP 和基因水平上进行广泛的信息聚合。假设不同遗传结构的不同参数的 - 值被组合起来,为体细胞突变和种系变异产生自适应测试。广泛的模拟表明,与一些常用方法相比,我们的自适应体细胞突变/种系变异测试可以应用于多个种系 SNPs/基因/途径,并且通常具有更高的统计功效,同时保持适当的 I 型错误。所提出的测试应用于一个大规模的真实世界的国际癌症基因组联合会 2583 名受试者的全基因组测序数据集,与其他现有方法相比,在基因和途径水平上都检测到更多显著和生物学相关的关联。我们的研究系统地确定了不同癌症类型中各种种系变异与体细胞突变之间的关联,这可能为癌症风险预测、预后和治疗提供有价值的实用信息。