Suppr超能文献

癌症外显子组测序数据的详细模拟揭示了变异检测工具的差异和常见局限性。

Detailed simulation of cancer exome sequencing data reveals differences and common limitations of variant callers.

作者信息

Hofmann Ariane L, Behr Jonas, Singer Jochen, Kuipers Jack, Beisel Christian, Schraml Peter, Moch Holger, Beerenwinkel Niko

机构信息

Department of Biosystems Science and Engineering, ETH Zurich, Mattenstr, Basel, 26, 4058, Switzerland.

Swiss Institute of Bioinformatics, Mattenstr, Basel, 26, 4058, Switzerland.

出版信息

BMC Bioinformatics. 2017 Jan 3;18(1):8. doi: 10.1186/s12859-016-1417-7.

Abstract

BACKGROUND

Next-generation sequencing of matched tumor and normal biopsy pairs has become a technology of paramount importance for precision cancer treatment. Sequencing costs have dropped tremendously, allowing the sequencing of the whole exome of tumors for just a fraction of the total treatment costs. However, clinicians and scientists cannot take full advantage of the generated data because the accuracy of analysis pipelines is limited. This particularly concerns the reliable identification of subclonal mutations in a cancer tissue sample with very low frequencies, which may be clinically relevant.

RESULTS

Using simulations based on kidney tumor data, we compared the performance of nine state-of-the-art variant callers, namely deepSNV, GATK HaplotypeCaller, GATK UnifiedGenotyper, JointSNVMix2, MuTect, SAMtools, SiNVICT, SomaticSniper, and VarScan2. The comparison was done as a function of variant allele frequencies and coverage. Our analysis revealed that deepSNV and JointSNVMix2 perform very well, especially in the low-frequency range. We attributed false positive and false negative calls of the nine tools to specific error sources and assigned them to processing steps of the pipeline. All of these errors can be expected to occur in real data sets. We found that modifying certain steps of the pipeline or parameters of the tools can lead to substantial improvements in performance. Furthermore, a novel integration strategy that combines the ranks of the variants yielded the best performance. More precisely, the rank-combination of deepSNV, JointSNVMix2, MuTect, SiNVICT and VarScan2 reached a sensitivity of 78% when fixing the precision at 90%, and outperformed all individual tools, where the maximum sensitivity was 71% with the same precision.

CONCLUSIONS

The choice of well-performing tools for alignment and variant calling is crucial for the correct interpretation of exome sequencing data obtained from mixed samples, and common pipelines are suboptimal. We were able to relate observed substantial differences in performance to the underlying statistical models of the tools, and to pinpoint the error sources of false positive and false negative calls. These findings might inspire new software developments that improve exome sequencing pipelines and further the field of precision cancer treatment.

摘要

背景

配对肿瘤和正常活检样本的二代测序已成为精准癌症治疗至关重要的一项技术。测序成本大幅下降,使得对肿瘤全外显子组进行测序的费用仅占总治疗费用的一小部分。然而,临床医生和科学家无法充分利用所生成的数据,因为分析流程的准确性有限。这尤其涉及到在癌症组织样本中可靠识别低频亚克隆突变,而这些突变可能具有临床相关性。

结果

利用基于肾肿瘤数据的模拟,我们比较了九种最先进的变异检测工具的性能,即deepSNV、GATK HaplotypeCaller、GATK UnifiedGenotyper、JointSNVMix2、MuTect、SAMtools、SiNVICT、SomaticSniper和VarScan2。比较是根据变异等位基因频率和覆盖度进行的。我们的分析表明,deepSNV和JointSNVMix2表现非常出色,尤其是在低频范围内。我们将这九种工具的假阳性和假阴性调用归因于特定的误差来源,并将它们分配到流程的处理步骤中。所有这些误差预计都会出现在实际数据集中。我们发现修改流程的某些步骤或工具的参数可以显著提高性能。此外,一种结合变异排名的新型整合策略产生了最佳性能。更确切地说,当将精确率固定在90%时,deepSNV、JointSNVMix2、MuTect、SiNVICT和VarScan2的排名组合达到了78%的灵敏度,并且优于所有单个工具,在相同精确率下,单个工具的最大灵敏度为71%。

结论

选择性能良好的比对和变异检测工具对于正确解读从混合样本中获得的外显子组测序数据至关重要,并且通用流程并非最优。我们能够将观察到的性能显著差异与工具的基础统计模型相关联,并确定假阳性和假阴性调用的误差来源。这些发现可能会激发新的软件开发,从而改进外显子组测序流程并推动精准癌症治疗领域的发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9f35/5209852/f74b616e82c5/12859_2016_1417_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验