Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA.
Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA ; Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN, USA.
Genome Med. 2013 Oct 11;5(10):91. doi: 10.1186/gm495. eCollection 2013.
Driven by high throughput next generation sequencing technologies and the pressing need to decipher cancer genomes, computational approaches for detecting somatic single nucleotide variants (sSNVs) have undergone dramatic improvements during the past 2 years. The recently developed tools typically compare a tumor sample directly with a matched normal sample at each variant locus in order to increase the accuracy of sSNV calling. These programs also address the detection of sSNVs at low allele frequencies, allowing for the study of tumor heterogeneity, cancer subclones, and mutation evolution in cancer development.
We used whole genome sequencing (Illumina Genome Analyzer IIx platform) of a melanoma sample and matched blood, whole exome sequencing (Illumina HiSeq 2000 platform) of 18 lung tumor-normal pairs and seven lung cancer cell lines to evaluate six tools for sSNV detection: EBCall, JointSNVMix, MuTect, SomaticSniper, Strelka, and VarScan 2, with a focus on MuTect and VarScan 2, two widely used publicly available software tools. Default/suggested parameters were used to run these tools. The missense sSNVs detected in these samples were validated through PCR and direct sequencing of genomic DNA from the samples. We also simulated 10 tumor-normal pairs to explore the ability of these programs to detect low allelic-frequency sSNVs.
Out of the 237 sSNVs successfully validated in our cancer samples, VarScan 2 and MuTect detected the most of any tools (that is, 204 and 192, respectively). MuTect identified 11 more low-coverage validated sSNVs than VarScan 2, but missed 11 more sSNVs with alternate alleles in normal samples than VarScan 2. When examining the false calls of each tool using 169 invalidated sSNVs, we observed >63% false calls detected in the lung cancer cell lines had alternate alleles in normal samples. Additionally, from our simulation data, VarScan 2 identified more sSNVs than other tools, while MuTect characterized most low allelic-fraction sSNVs.
Our study explored the typical false-positive and false-negative detections that arise from the use of sSNV-calling tools. Our results suggest that despite recent progress, these tools have significant room for improvement, especially in the discrimination of low coverage/allelic-frequency sSNVs and sSNVs with alternate alleles in normal samples.
受高通量下一代测序技术的推动,以及破译癌症基因组的迫切需求,用于检测体细胞单核苷酸变异(sSNV)的计算方法在过去 2 年中取得了显著进展。最近开发的工具通常在每个变异位点直接比较肿瘤样本和匹配的正常样本,以提高 sSNV 调用的准确性。这些程序还解决了低频等位基因 sSNV 的检测问题,允许研究肿瘤异质性、癌症亚克隆和癌症发展过程中的突变进化。
我们使用黑色素瘤样本的全基因组测序(Illumina Genome Analyzer IIx 平台)和 18 对肺肿瘤-正常对和 7 个肺癌细胞系的全外显子组测序(Illumina HiSeq 2000 平台)来评估 6 种用于 sSNV 检测的工具:EBCall、JointSNVMix、MuTect、SomaticSniper、Strelka 和 VarScan 2,重点关注 MuTect 和 VarScan 2,这两种是广泛使用的公开可用软件工具。使用默认/建议的参数运行这些工具。通过对样本基因组 DNA 进行 PCR 和直接测序,验证了这些样本中检测到的错义 sSNV。我们还模拟了 10 对肿瘤-正常对,以探索这些程序检测低频等位基因 sSNV 的能力。
在我们的癌症样本中成功验证的 237 个 sSNV 中,VarScan 2 和 MuTect 检测到的数量最多(分别为 204 个和 192 个)。MuTect 比 VarScan 2 检测到更多低覆盖度的验证 sSNV,但在正常样本中比 VarScan 2 漏掉了更多具有替代等位基因的 sSNV。当使用 169 个无效 sSNV 检查每个工具的假阳性时,我们观察到 >63% 在肺癌细胞系中检测到的假阳性在正常样本中具有替代等位基因。此外,从我们的模拟数据来看,VarScan 2 比其他工具识别出更多的 sSNV,而 MuTect 则描述了大多数低频等位基因分数的 sSNV。
我们的研究探讨了使用 sSNV 调用工具时出现的典型假阳性和假阴性检测。我们的结果表明,尽管最近取得了进展,但这些工具仍有很大的改进空间,特别是在区分低覆盖度/等位基因频率 sSNV 和正常样本中具有替代等位基因的 sSNV 方面。