Rojas-Rodriguez Felipe, Schmidt Marjanka K, Canisius Sander
Division of Molecular Pathology, The Netherlands Cancer Institute-Antoni van Leeuwenhoek Hospital, 1066 CX Amsterdam, The Netherlands.
Department of Clinical Genetics, Leiden University Medical Center, 2333 ZC Leiden, The Netherlands.
Bioinform Adv. 2024 May 23;4(1):vbae073. doi: 10.1093/bioadv/vbae073. eCollection 2024.
Most cancer driver gene identification tools have been developed for whole-exome sequencing data. Targeted sequencing is a popular alternative to whole-exome sequencing for large cancer studies due to its greater depth at a lower cost per tumor. Unlike whole-exome sequencing, targeted sequencing only enables mutation calling for a selected subset of genes. Whether existing driver gene identification tools remain valid in that context has not previously been studied.
We evaluated the validity of seven popular driver gene identification tools when applied to targeted sequencing data. Based on whole-exome data of 14 different cancer types from TCGA, we constructed matching targeted datasets by keeping only the mutations overlapping with the pan-cancer MSK-IMPACT panel and, in the case of breast cancer, also the breast-cancer-specific B-CAST panel. We then compared the driver gene predictions obtained on whole-exome and targeted mutation data for each of the seven tools. Differences in how the tools model background mutation rates were the most important determinant of their validity on targeted sequencing data. Based on our results, we recommend OncodriveFML, OncodriveCLUSTL, 20/20+, dNdSCv, and ActiveDriver for driver gene identification in targeted sequencing data, whereas MutSigCV and DriverML are best avoided in that context.
Code for the analyses is available at https://github.com/SchmidtGroupNKI/TGSdrivergene_validity.
大多数癌症驱动基因识别工具是针对全外显子组测序数据开发的。靶向测序由于在每个肿瘤上成本更低且深度更大,是大型癌症研究中全外显子组测序的一种流行替代方法。与全外显子组测序不同,靶向测序仅能对选定的基因子集进行突变检测。此前尚未研究过现有的驱动基因识别工具在这种情况下是否仍然有效。
我们评估了七种流行的驱动基因识别工具应用于靶向测序数据时的有效性。基于来自癌症基因组图谱(TCGA)的14种不同癌症类型的全外显子组数据,我们通过仅保留与泛癌MSK-IMPACT面板重叠的突变来构建匹配的靶向数据集,对于乳腺癌,还包括乳腺癌特异性的B-CAST面板。然后,我们比较了这七种工具在全外显子组和靶向突变数据上获得的驱动基因预测结果。工具对背景突变率建模方式的差异是其在靶向测序数据上有效性的最重要决定因素。根据我们的结果,我们推荐OncodriveFML、OncodriveCLUSTL、20/20+、dNdSCv和ActiveDriver用于靶向测序数据中的驱动基因识别,而在这种情况下最好避免使用MutSigCV和DriverML。
分析代码可在https://github.com/SchmidtGroupNKI/TGSdrivergene_validity获取。