Cai Lei, Yuan Wei, Zhang Zhou, He Lin, Chou Kuo-Chen
Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders (No.13dz2260500), Shanghai Jiao Tong University, Shanghai, 200030, China.
Gordon Life Science Institute, Boston, Massachusetts, 02478, USA.
Sci Rep. 2016 Nov 22;6:36540. doi: 10.1038/srep36540.
Four popular somatic single nucleotide variant (SNV) calling methods (Varscan, SomaticSniper, Strelka and MuTect2) were carefully evaluated on the real whole exome sequencing (WES, depth of ~50X) and ultra-deep targeted sequencing (UDT-Seq, depth of ~370X) data. The four tools returned poor consensus on candidates (only 20% of calls were with multiple hits by the callers). For both WES and UDT-Seq, MuTect2 and Strelka obtained the largest proportion of COSMIC entries as well as the lowest rate of dbSNP presence and high-alternative-alleles-in-control calls, demonstrating their superior sensitivity and accuracy. Combining different callers does increase reliability of candidates, but narrows the list down to very limited range of tumor read depth and variant allele frequency. Calling SNV on UDT-Seq data, which were of much higher read-depth, discovered additional true-positive variations, despite an even more tremendous growth in false positive predictions. Our findings not only provide valuable benchmark for state-of-the-art SNV calling methods, but also shed light on the access to more accurate SNV identification in the future.
在真实的全外显子组测序(WES,深度约为50X)和超深度靶向测序(UDT-Seq,深度约为370X)数据上,对四种常用的体细胞单核苷酸变异(SNV)检测方法(Varscan、SomaticSniper、Strelka和MuTect2)进行了仔细评估。这四种工具对候选变异的一致性较差(只有20%的检测结果被多个工具命中)。对于WES和UDT-Seq,MuTect2和Strelka获得的COSMIC条目比例最高,dbSNP存在率和对照中高替代等位基因的检出率最低,证明了它们卓越的灵敏度和准确性。组合不同的检测工具确实能提高候选变异的可靠性,但将列表范围缩小到非常有限的肿瘤读深度和变异等位基因频率范围。在具有更高读深度的UDT-Seq数据上检测SNV,尽管假阳性预测有了更大幅度的增长,但仍发现了额外的真阳性变异。我们的研究结果不仅为当前最先进的SNV检测方法提供了有价值的基准,也为未来获得更准确的SNV鉴定提供了思路。