Department of Mathematics and Statistics, Wright State University, Dayton, Ohio, United States of America.
Biostatistics Program, Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America.
PLoS Genet. 2018 Nov 2;14(11):e1007746. doi: 10.1371/journal.pgen.1007746. eCollection 2018 Nov.
Somatic mutations drive the growth of tumor cells and are pivotal biomarkers for many cancer treatments. Genetic association analysis using somatic mutations is an effective approach to study the functional impact of somatic mutations. However, standard regression methods are not appropriate for somatic mutation association studies because somatic mutation calls often have non-ignorable false positive rate and/or false negative rate. While large scale association analysis using somatic mutations becomes feasible recently-thanks for the improvement of sequencing techniques and the reduction of sequencing cost-there is an urgent need for a new statistical method designed for somatic mutation association analysis. We propose such a method with computationally efficient software implementation: Somatic mutation Association test with Measurement Errors (SAME). SAME accounts for somatic mutation calling uncertainty using a likelihood based approach. It can be used to assess the associations between continuous/dichotomous outcomes and individual mutations or gene-level mutations. Through simulation studies across a wide range of realistic scenarios, we show that SAME can significantly improve statistical power than the naive generalized linear model that ignores mutation calling uncertainty. Finally, using the data collected from The Cancer Genome Atlas (TCGA) project, we apply SAME to study the associations between somatic mutations and gene expression in 12 cancer types, as well as the associations between somatic mutations and colon cancer subtype defined by DNA methylation data. SAME recovered some interesting findings that were missed by the generalized linear model. In addition, we demonstrated that mutation-level and gene-level analyses are often more appropriate for oncogene and tumor-suppressor gene, respectively.
体细胞突变驱动肿瘤细胞的生长,是许多癌症治疗的关键生物标志物。使用体细胞突变进行遗传关联分析是研究体细胞突变功能影响的有效方法。然而,标准回归方法并不适用于体细胞突变关联研究,因为体细胞突变检测通常具有不可忽略的假阳性率和/或假阴性率。虽然由于测序技术的改进和测序成本的降低,最近大规模使用体细胞突变进行关联分析变得可行,但迫切需要一种新的专门用于体细胞突变关联分析的统计方法。我们提出了一种具有计算效率的软件实现方法:带有测量误差的体细胞突变关联测试(SAME)。SAME 使用基于似然的方法来解释体细胞突变检测的不确定性。它可用于评估连续/二分类结果与个体突变或基因水平突变之间的关联。通过在广泛的现实场景中进行模拟研究,我们表明 SAME 可以显著提高统计功效,优于忽略突变检测不确定性的简单广义线性模型。最后,我们使用从癌症基因组图谱(TCGA)项目收集的数据,应用 SAME 研究了 12 种癌症类型中体细胞突变与基因表达之间的关联,以及体细胞突变与基于 DNA 甲基化数据定义的结肠癌亚型之间的关联。SAME 发现了一些被广义线性模型遗漏的有趣发现。此外,我们还证明了突变水平和基因水平分析分别更适合癌基因和肿瘤抑制基因。