Molecular Health GmbH, Kurfürsten-Anlage 21, 69115 Heidelberg, Germany.
Bioinformatics. 2017 Sep 15;33(18):2791-2798. doi: 10.1093/bioinformatics/btx284.
Whole exome and gene panel sequencing are increasingly used for oncological diagnostics. To investigate the accuracy of SCNA detection algorithms on simulated and clinical tumor samples, the precision and sensitivity of four SCNA callers were measured using 50 simulated whole exome and 50 simulated targeted gene panel datasets, and using 119 TCGA tumor samples for which SNP array data were available.
On synthetic exome and panel data, VarScan2 mostly called false positives, whereas Control-FREEC was precise (>90% correct calls) at the cost of low sensitivity (<40% detected). ONCOCNV was slightly less precise on gene panel data, with similarly low sensitivity. This could be explained by low sensitivity for amplifications and high precision for deletions. Surprisingly, these results were not strongly affected by moderate tumor impurities; only contaminations with more than 60% non-cancerous cells resulted in strongly declining precision and sensitivity. On the 119 clinical samples, both Control-FREEC and CNVkit called 71.8% and 94%, respectively, of the SCNAs found by the SNP arrays, but with a considerable amount of false positives (precision 29% and 4.9%).
Whole exome and targeted gene panel methods by design limit the precision of SCNA callers, making them prone to false positives. SCNA calls cannot easily be integrated in clinical pipelines that use data from targeted capture-based sequencing. If used at all, they need to be cross-validated using orthogonal methods.
Scripts are provided as supplementary information.
gunther.jansen@molecularhealth.com.
Supplementary data are available at Bioinformatics online.
全外显子组和基因panel 测序越来越多地用于肿瘤学诊断。为了研究 SCNA 检测算法在模拟和临床肿瘤样本中的准确性,使用 50 个模拟全外显子组和 50 个模拟靶向基因panel 数据集,以及 119 个具有 SNP 阵列数据的 TCGA 肿瘤样本,测量了四个 SCNA 调用者的精度和灵敏度。
在合成外显子组和panel 数据上,VarScan2 主要会错误地检出假阳性,而 Control-FREEC 的精度(>90%正确调用)较高,但代价是灵敏度较低(<40%检出)。ONCOCNV 在基因panel 数据上的精度略低,灵敏度也相似。这可以解释为扩增的灵敏度较低,缺失的精度较高。令人惊讶的是,这些结果并没有受到中等肿瘤杂质的强烈影响;只有当非癌细胞的比例超过 60%时,精度和灵敏度才会明显下降。在 119 个临床样本中,Control-FREEC 和 CNVkit 分别正确检出 SNP 阵列发现的 SCNA 的 71.8%和 94%,但有相当数量的假阳性(精度为 29%和 4.9%)。
全外显子组和靶向基因panel 方法的设计限制了 SCNA 调用者的精度,使它们容易出现假阳性。SCNA 调用不能轻易地集成到使用靶向捕获测序数据的临床工作流程中。如果要使用,需要使用正交方法进行交叉验证。
脚本作为补充信息提供。
gunther.jansen@molecularhealth.com。
补充数据可在 Bioinformatics 在线获取。