癌症外显子组测序数据的详细模拟揭示了变异检测工具的差异和常见局限性。

Detailed simulation of cancer exome sequencing data reveals differences and common limitations of variant callers.

作者信息

Hofmann Ariane L, Behr Jonas, Singer Jochen, Kuipers Jack, Beisel Christian, Schraml Peter, Moch Holger, Beerenwinkel Niko

机构信息

Department of Biosystems Science and Engineering, ETH Zurich, Mattenstr, Basel, 26, 4058, Switzerland.

Swiss Institute of Bioinformatics, Mattenstr, Basel, 26, 4058, Switzerland.

出版信息

BMC Bioinformatics. 2017 Jan 3;18(1):8. doi: 10.1186/s12859-016-1417-7.

DOI:10.1186/s12859-016-1417-7

PMID:28049408

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5209852/

Abstract

BACKGROUND

Next-generation sequencing of matched tumor and normal biopsy pairs has become a technology of paramount importance for precision cancer treatment. Sequencing costs have dropped tremendously, allowing the sequencing of the whole exome of tumors for just a fraction of the total treatment costs. However, clinicians and scientists cannot take full advantage of the generated data because the accuracy of analysis pipelines is limited. This particularly concerns the reliable identification of subclonal mutations in a cancer tissue sample with very low frequencies, which may be clinically relevant.

RESULTS

Using simulations based on kidney tumor data, we compared the performance of nine state-of-the-art variant callers, namely deepSNV, GATK HaplotypeCaller, GATK UnifiedGenotyper, JointSNVMix2, MuTect, SAMtools, SiNVICT, SomaticSniper, and VarScan2. The comparison was done as a function of variant allele frequencies and coverage. Our analysis revealed that deepSNV and JointSNVMix2 perform very well, especially in the low-frequency range. We attributed false positive and false negative calls of the nine tools to specific error sources and assigned them to processing steps of the pipeline. All of these errors can be expected to occur in real data sets. We found that modifying certain steps of the pipeline or parameters of the tools can lead to substantial improvements in performance. Furthermore, a novel integration strategy that combines the ranks of the variants yielded the best performance. More precisely, the rank-combination of deepSNV, JointSNVMix2, MuTect, SiNVICT and VarScan2 reached a sensitivity of 78% when fixing the precision at 90%, and outperformed all individual tools, where the maximum sensitivity was 71% with the same precision.

CONCLUSIONS

The choice of well-performing tools for alignment and variant calling is crucial for the correct interpretation of exome sequencing data obtained from mixed samples, and common pipelines are suboptimal. We were able to relate observed substantial differences in performance to the underlying statistical models of the tools, and to pinpoint the error sources of false positive and false negative calls. These findings might inspire new software developments that improve exome sequencing pipelines and further the field of precision cancer treatment.

摘要

背景

配对肿瘤和正常活检样本的二代测序已成为精准癌症治疗至关重要的一项技术。测序成本大幅下降，使得对肿瘤全外显子组进行测序的费用仅占总治疗费用的一小部分。然而，临床医生和科学家无法充分利用所生成的数据，因为分析流程的准确性有限。这尤其涉及到在癌症组织样本中可靠识别低频亚克隆突变，而这些突变可能具有临床相关性。

结果

利用基于肾肿瘤数据的模拟，我们比较了九种最先进的变异检测工具的性能，即deepSNV、GATK HaplotypeCaller、GATK UnifiedGenotyper、JointSNVMix2、MuTect、SAMtools、SiNVICT、SomaticSniper和VarScan2。比较是根据变异等位基因频率和覆盖度进行的。我们的分析表明，deepSNV和JointSNVMix2表现非常出色，尤其是在低频范围内。我们将这九种工具的假阳性和假阴性调用归因于特定的误差来源，并将它们分配到流程的处理步骤中。所有这些误差预计都会出现在实际数据集中。我们发现修改流程的某些步骤或工具的参数可以显著提高性能。此外，一种结合变异排名的新型整合策略产生了最佳性能。更确切地说，当将精确率固定在90%时，deepSNV、JointSNVMix2、MuTect、SiNVICT和VarScan2的排名组合达到了78%的灵敏度，并且优于所有单个工具，在相同精确率下，单个工具的最大灵敏度为71%。

结论

选择性能良好的比对和变异检测工具对于正确解读从混合样本中获得的外显子组测序数据至关重要，并且通用流程并非最优。我们能够将观察到的性能显著差异与工具的基础统计模型相关联，并确定假阳性和假阴性调用的误差来源。这些发现可能会激发新的软件开发，从而改进外显子组测序流程并推动精准癌症治疗领域的发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9f35/5209852/f74b616e82c5/12859_2016_1417_Fig1_HTML.jpg

相似文献

Detailed simulation of cancer exome sequencing data reveals differences and common limitations of variant callers.癌症外显子组测序数据的详细模拟揭示了变异检测工具的差异和常见局限性。

BMC Bioinformatics. 2017 Jan 3;18(1):8. doi: 10.1186/s12859-016-1417-7.

Impact of post-alignment processing in variant discovery from whole exome data.全外显子数据变异发现中比对后处理的影响

BMC Bioinformatics. 2016 Oct 3;17(1):403. doi: 10.1186/s12859-016-1279-z.

VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering.变异元调用器：用于基于定量、精确性筛选的变异调用流程的自动融合。

BMC Genomics. 2015 Oct 28;16:875. doi: 10.1186/s12864-015-2050-y.

Variant callers for next-generation sequencing data: a comparison study.下一代测序数据的变异调用者：一项比较研究。

PLoS One. 2013 Sep 27;8(9):e75619. doi: 10.1371/journal.pone.0075619. eCollection 2013.

SiNVICT: ultra-sensitive detection of single nucleotide variants and indels in circulating tumour DNA.SiNVICT：循环肿瘤 DNA 中单核苷酸变异和插入缺失的超灵敏检测。

Bioinformatics. 2017 Jan 1;33(1):26-34. doi: 10.1093/bioinformatics/btw536. Epub 2016 Aug 16.

Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data.评估九种体细胞变异检测工具在全外显子组测序和靶向深度测序数据中检测体细胞突变的性能

PLoS One. 2016 Mar 22;11(3):e0151664. doi: 10.1371/journal.pone.0151664. eCollection 2016.

Comprehensive benchmarking of SNV callers for highly admixed tumor data.针对高度混合肿瘤数据的单核苷酸变异（SNV）检测工具的综合基准测试。

PLoS One. 2017 Oct 11;12(10):e0186175. doi: 10.1371/journal.pone.0186175. eCollection 2017.

Challenges in exome analysis by LifeScope and its alternative computational pipelines.LifeScope及其替代计算流程在全外显子组分析中的挑战。

BMC Res Notes. 2015 Sep 7;8:421. doi: 10.1186/s13104-015-1385-4.

SNVSniffer: an integrated caller for germline and somatic single-nucleotide and indel mutations.SNVSniffer：一种用于种系和体细胞单核苷酸及插入缺失突变的综合检测工具。

BMC Syst Biol. 2016 Aug 1;10 Suppl 2(Suppl 2):47. doi: 10.1186/s12918-016-0300-5.

Validation and assessment of variant calling pipelines for next-generation sequencing.下一代测序变异检测流程的验证与评估

Hum Genomics. 2014 Jul 30;8(1):14. doi: 10.1186/1479-7364-8-14.

引用本文的文献

Performance comparison of germline variant calling tools in sporadic disease cohorts.散发性疾病队列中种系变异检测工具的性能比较

Mol Genet Genomics. 2025 Sep 6;300(1):90. doi: 10.1007/s00438-025-02292-0.

Benchmarking UMI-aware and standard variant callers for low frequency ctDNA variant detection.基于 UMIs 的低频 ctDNA 变异检测与标准变异 caller 的基准测试

BMC Genomics. 2024 Sep 3;25(1):827. doi: 10.1186/s12864-024-10737-w.

Comparison of Nanopore and Synthesis-Based Next-Generation Sequencing Platforms for SARS-CoV-2 Variant Monitoring in Wastewater.基于纳米孔和合成的下一代测序平台在废水中用于 SARS-CoV-2 变异监测的比较。

Int J Mol Sci. 2023 Dec 6;24(24):17184. doi: 10.3390/ijms242417184.

Unraveling the Role of Molecular Profiling in Predicting Treatment Response in Stage III Colorectal Cancer Patients: Insights from the IDEA International Study.解析分子谱分析在预测Ⅲ期结直肠癌患者治疗反应中的作用：来自IDEA国际研究的见解

Cancers (Basel). 2023 Sep 30;15(19):4819. doi: 10.3390/cancers15194819.

Host genetics and gut microbiota jointly regulate blood biochemical indicators in chickens.宿主遗传学和肠道微生物群共同调节鸡的血液生化指标。

Appl Microbiol Biotechnol. 2023 Dec;107(24):7601-7620. doi: 10.1007/s00253-023-12814-8. Epub 2023 Oct 4.

Human Exome Sequencing and Prospects for Predictive Medicine: Analysis of International Data and Own Experience.人类外显子组测序与精准医学前景：国际数据及自身经验分析

J Pers Med. 2023 Aug 8;13(8):1236. doi: 10.3390/jpm13081236.

The Transition from Cancer "omics" to "epi-omics" through Next- and Third-Generation Sequencing.通过下一代测序和第三代测序实现从癌症“组学”到“表观组学”的转变。

Life (Basel). 2022 Dec 2;12(12):2010. doi: 10.3390/life12122010.

J-SPACE: a Julia package for the simulation of spatial models of cancer evolution and of sequencing experiments.J-SPACE：一个用于模拟癌症进化和测序实验的空间模型的 Julia 包。

BMC Bioinformatics. 2022 Jul 8;23(1):269. doi: 10.1186/s12859-022-04779-8.

OPUSeq simplifies detection of low-frequency DNA variants and uncovers fragmentase-associated artifacts.OPUSeq简化了低频DNA变异的检测，并揭示了与片段酶相关的假象。

NAR Genom Bioinform. 2022 Jun 27;4(2):lqac048. doi: 10.1093/nargab/lqac048. eCollection 2022 Jun.

Cancer Neoantigens: Challenges and Future Directions for Prediction, Prioritization, and Validation.癌症新抗原：预测、优先级排序及验证面临的挑战与未来方向

Front Oncol. 2022 Mar 3;12:836821. doi: 10.3389/fonc.2022.836821. eCollection 2022.

本文引用的文献

SiNVICT: ultra-sensitive detection of single nucleotide variants and indels in circulating tumour DNA.SiNVICT：循环肿瘤 DNA 中单核苷酸变异和插入缺失的超灵敏检测。

Bioinformatics. 2017 Jan 1;33(1):26-34. doi: 10.1093/bioinformatics/btw536. Epub 2016 Aug 16.

The 2016 WHO Classification of Tumours of the Urinary System and Male Genital Organs-Part A: Renal, Penile, and Testicular Tumours.《2016 年世界卫生组织泌尿系统及男性生殖器官肿瘤分类—第 A 部分：肾脏、阴茎和睾丸肿瘤》。

Eur Urol. 2016 Jul;70(1):93-105. doi: 10.1016/j.eururo.2016.02.029. Epub 2016 Feb 28.

A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing.利用全基因组测序对癌症中体细胞突变检测进行的全面评估。

Nat Commun. 2015 Dec 9;6:10001. doi: 10.1038/ncomms10001.

Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data.Qualimap 2：用于高通量测序数据的高级多样本质量控制

Bioinformatics. 2016 Jan 15;32(2):292-4. doi: 10.1093/bioinformatics/btv566. Epub 2015 Oct 1.

Subclonal diversification of primary breast cancer revealed by multiregion sequencing.多区域测序揭示原发性乳腺癌的亚克隆多样性

Nat Med. 2015 Jul;21(7):751-9. doi: 10.1038/nm.3886. Epub 2015 Jun 22.

From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline.从FastQ数据到高可信度变异检测：基因组分析工具包最佳实践流程

Curr Protoc Bioinformatics. 2013;43(1110):11.10.1-11.10.33. doi: 10.1002/0471250953.bi1110s43.

Validation and assessment of variant calling pipelines for next-generation sequencing.下一代测序变异检测流程的验证与评估

Hum Genomics. 2014 Jul 30;8(1):14. doi: 10.1186/1479-7364-8-14.

Toward better understanding of artifacts in variant calling from high-coverage samples.为了更好地理解高覆盖样本中变体调用中的伪影。

Bioinformatics. 2014 Oct 15;30(20):2843-51. doi: 10.1093/bioinformatics/btu356. Epub 2014 Jun 27.

Emerging targeted therapies for melanoma treatment (review).黑色素瘤治疗的新兴靶向疗法（综述）

Int J Oncol. 2014 Aug;45(2):516-24. doi: 10.3892/ijo.2014.2481. Epub 2014 Jun 3.

BAYSIC: a Bayesian method for combining sets of genome variants with improved specificity and sensitivity.BAYSIC：一种用于组合基因组变异集的贝叶斯方法，可提高特异性和灵敏度。

BMC Bioinformatics. 2014 Apr 12;15:104. doi: 10.1186/1471-2105-15-104.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

癌症外显子组测序数据的详细模拟揭示了变异检测工具的差异和常见局限性。

Detailed simulation of cancer exome sequencing data reveals differences and common limitations of variant callers.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献