Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China.
Center for Precision Medicine, Sun Yat-sen University, Guangzhou 510080, China.
Nucleic Acids Res. 2019 Sep 19;47(16):e96. doi: 10.1093/nar/gkz566.
Genomic identification of driver mutations and genes in cancer cells are critical for precision medicine. Due to difficulty in modelling distribution of background mutation counts, existing statistical methods are often underpowered to discriminate cancer-driver genes from passenger genes. Here we propose a novel statistical approach, weighted iterative zero-truncated negative-binomial regression (WITER, http://grass.cgs.hku.hk/limx/witer or KGGSeq,http://grass.cgs.hku.hk/limx/kggseq/), to detect cancer-driver genes showing an excess of somatic mutations. By fitting the distribution of background mutation counts properly, this approach works well even in small or moderate samples. Compared to alternative methods, it detected more significant and cancer-consensus genes in most tested cancers. Applying this approach, we estimated 229 driver genes in 26 different types of cancers. In silico validation confirmed 78% of predicted genes as likely known drivers and many other genes as very likely new drivers for corresponding cancers. The technical advances of WITER enable the detection of driver genes in TCGA datasets as small as 30 subjects and rescue of more genes missed by alternative tools in moderate or small samples.
癌症细胞中驱动突变和基因的基因组鉴定对于精准医学至关重要。由于难以对背景突变计数的分布进行建模,现有的统计方法通常无法从过客基因中区分癌症驱动基因。在这里,我们提出了一种新的统计方法,加权迭代零截断负二项式回归(WITER,http://grass.cgs.hku.hk/limx/witer 或 KGGSeq,http://grass.cgs.hku.hk/limx/kggseq/),用于检测显示体细胞突变过多的癌症驱动基因。通过适当拟合背景突变计数的分布,即使在小样本或中等样本中,该方法也能很好地工作。与替代方法相比,它在大多数测试的癌症中检测到更多显著的和癌症共识的基因。应用该方法,我们在 26 种不同类型的癌症中估计了 229 个驱动基因。计算机验证证实了预测基因中有 78%可能是已知的驱动基因,还有许多其他基因可能是相应癌症的新驱动基因。WITER 的技术进步使得在 TCGA 数据集小至 30 个样本的情况下也能检测到驱动基因,并在中等或小样本中挽救了许多被其他工具遗漏的基因。