基于不同肿瘤下一代测序深度数据的体细胞点突变检测工具的深入比较。

In-depth comparison of somatic point mutation callers based on different tumor next-generation sequencing depth data.

作者信息

Cai Lei, Yuan Wei, Zhang Zhou, He Lin, Chou Kuo-Chen

机构信息

Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders (No.13dz2260500), Shanghai Jiao Tong University, Shanghai, 200030, China.

Gordon Life Science Institute, Boston, Massachusetts, 02478, USA.

出版信息

Sci Rep. 2016 Nov 22;6:36540. doi: 10.1038/srep36540.

DOI:10.1038/srep36540

PMID:27874022

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5118795/

Abstract

Four popular somatic single nucleotide variant (SNV) calling methods (Varscan, SomaticSniper, Strelka and MuTect2) were carefully evaluated on the real whole exome sequencing (WES, depth of ~50X) and ultra-deep targeted sequencing (UDT-Seq, depth of ~370X) data. The four tools returned poor consensus on candidates (only 20% of calls were with multiple hits by the callers). For both WES and UDT-Seq, MuTect2 and Strelka obtained the largest proportion of COSMIC entries as well as the lowest rate of dbSNP presence and high-alternative-alleles-in-control calls, demonstrating their superior sensitivity and accuracy. Combining different callers does increase reliability of candidates, but narrows the list down to very limited range of tumor read depth and variant allele frequency. Calling SNV on UDT-Seq data, which were of much higher read-depth, discovered additional true-positive variations, despite an even more tremendous growth in false positive predictions. Our findings not only provide valuable benchmark for state-of-the-art SNV calling methods, but also shed light on the access to more accurate SNV identification in the future.

摘要

在真实的全外显子组测序（WES，深度约为50X）和超深度靶向测序（UDT-Seq，深度约为370X）数据上，对四种常用的体细胞单核苷酸变异（SNV）检测方法（Varscan、SomaticSniper、Strelka和MuTect2）进行了仔细评估。这四种工具对候选变异的一致性较差（只有20%的检测结果被多个工具命中）。对于WES和UDT-Seq，MuTect2和Strelka获得的COSMIC条目比例最高，dbSNP存在率和对照中高替代等位基因的检出率最低，证明了它们卓越的灵敏度和准确性。组合不同的检测工具确实能提高候选变异的可靠性，但将列表范围缩小到非常有限的肿瘤读深度和变异等位基因频率范围。在具有更高读深度的UDT-Seq数据上检测SNV，尽管假阳性预测有了更大幅度的增长，但仍发现了额外的真阳性变异。我们的研究结果不仅为当前最先进的SNV检测方法提供了有价值的基准，也为未来获得更准确的SNV鉴定提供了思路。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cac/5118795/e4b5149eb125/srep36540-f1.jpg

相似文献

In-depth comparison of somatic point mutation callers based on different tumor next-generation sequencing depth data.基于不同肿瘤下一代测序深度数据的体细胞点突变检测工具的深入比较。

Sci Rep. 2016 Nov 22;6:36540. doi: 10.1038/srep36540.

Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data.评估九种体细胞变异检测工具在全外显子组测序和靶向深度测序数据中检测体细胞突变的性能

PLoS One. 2016 Mar 22;11(3):e0151664. doi: 10.1371/journal.pone.0151664. eCollection 2016.

Accuracy and reproducibility of somatic point mutation calling in clinical-type targeted sequencing data.临床型靶向测序数据中体细胞点突变calling 的准确性和可重复性。

BMC Med Genomics. 2020 Oct 15;13(1):156. doi: 10.1186/s12920-020-00803-z.

Comparison of somatic variant detection algorithms using Ion Torrent targeted deep sequencing data.利用 Ion Torrent 靶向深度测序数据比较体细胞变异检测算法。

BMC Med Genomics. 2019 Dec 24;12(Suppl 9):181. doi: 10.1186/s12920-019-0636-y.

Detailed simulation of cancer exome sequencing data reveals differences and common limitations of variant callers.癌症外显子组测序数据的详细模拟揭示了变异检测工具的差异和常见局限性。

BMC Bioinformatics. 2017 Jan 3;18(1):8. doi: 10.1186/s12859-016-1417-7.

Comprehensive benchmarking of SNV callers for highly admixed tumor data.针对高度混合肿瘤数据的单核苷酸变异（SNV）检测工具的综合基准测试。

PLoS One. 2017 Oct 11;12(10):e0186175. doi: 10.1371/journal.pone.0186175. eCollection 2017.

SNVSniffer: an integrated caller for germline and somatic single-nucleotide and indel mutations.SNVSniffer：一种用于种系和体细胞单核苷酸及插入缺失突变的综合检测工具。

BMC Syst Biol. 2016 Aug 1;10 Suppl 2(Suppl 2):47. doi: 10.1186/s12918-016-0300-5.

Detecting somatic point mutations in cancer genome sequencing data: a comparison of mutation callers.在癌症基因组测序数据中检测体细胞点突变：突变调用程序的比较。

Genome Med. 2013 Oct 11;5(10):91. doi: 10.1186/gm495. eCollection 2013.

Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data.评估低频变异调用工具在检测短读长深度测序数据中的变异方面的性能。

Sci Rep. 2023 Nov 22;13(1):20444. doi: 10.1038/s41598-023-47135-3.

Comparison of somatic mutation calling methods in amplicon and whole exome sequence data.扩增子和全外显子组序列数据中体细胞突变检测方法的比较

BMC Genomics. 2014 Mar 28;15:244. doi: 10.1186/1471-2164-15-244.

引用本文的文献

Clinical and analytical validation of a combined RNA and DNA exome assay across a large tumor cohort.一项针对大型肿瘤队列的RNA和DNA外显子组联合检测的临床及分析验证

Commun Med (Lond). 2025 Jun 16;5(1):236. doi: 10.1038/s43856-025-00934-3.

UNISOM: Unified Somatic Calling and Machine Learning-based Classification Enhance the Discovery of CHIP.UNISOM：统一体细胞变异检测与基于机器学习的分类提升了克隆性造血的发现

Genomics Proteomics Bioinformatics. 2025 May 30;23(2). doi: 10.1093/gpbjnl/qzaf040.

Benchmarking UMI-aware and standard variant callers for low frequency ctDNA variant detection.基于 UMIs 的低频 ctDNA 变异检测与标准变异 caller 的基准测试

BMC Genomics. 2024 Sep 3;25(1):827. doi: 10.1186/s12864-024-10737-w.

Genomic Insights into Idiopathic Granulomatous Mastitis through Whole-Exome Sequencing: A Case Report of Eight Patients.通过全外显子组测序对特发性肉芽肿性乳腺炎的基因组学研究：八例患者的病例报告。

Int J Mol Sci. 2024 Aug 21;25(16):9058. doi: 10.3390/ijms25169058.

DEEPOMICS FFPE, a deep neural network model, identifies DNA sequencing artifacts from formalin fixed paraffin embedded tissue with high accuracy.DEEPOMICS FFPE 是一种深度神经网络模型，能够高精度地识别福尔马林固定石蜡包埋组织中的 DNA 测序伪影。

Sci Rep. 2024 Jan 31;14(1):2559. doi: 10.1038/s41598-024-53167-0.

Performance analysis of conventional and AI-based variant callers using short and long reads.使用短读长读对常规和基于人工智能的变异调用程序进行性能分析。

BMC Bioinformatics. 2023 Dec 14;24(1):472. doi: 10.1186/s12859-023-05596-3.

Quantification of rare somatic single nucleotide variants by droplet digital PCR using SuperSelective primers.利用 SuperSelective 引物通过液滴数字 PCR 对罕见的体细胞单核苷酸变异进行定量分析。

Sci Rep. 2023 Nov 3;13(1):18997. doi: 10.1038/s41598-023-39874-0.

Multicentric pilot study to standardize clinical whole exome sequencing (WES) for cancer patients.一项多中心试点研究，旨在使癌症患者临床全外显子组测序（WES）标准化。

NPJ Precis Oncol. 2023 Oct 20;7(1):106. doi: 10.1038/s41698-023-00457-x.

Comprehensive and realistic simulation of tumour genomic sequencing data.肿瘤基因组测序数据的全面且真实的模拟

NAR Cancer. 2023 Sep 22;5(3):zcad051. doi: 10.1093/narcan/zcad051. eCollection 2023 Sep.

Mutational signature assignment heterogeneity is widespread and can be addressed by ensemble approaches.突变特征分配异质性很普遍，可以通过集成方法解决。

Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad331.

本文引用的文献

repRNA: a web server for generating various feature vectors of RNA sequences.repRNA：一个用于生成RNA序列各种特征向量的网络服务器。

Mol Genet Genomics. 2016 Feb;291(1):473-81. doi: 10.1007/s00438-015-1078-7. Epub 2015 Jun 18.

Meta-Analysis-Based Preliminary Exploration of the Connection between ATDILI and Schizophrenia by GSTM1/T1 Gene Polymorphisms.基于荟萃分析对谷胱甘肽S-转移酶M1/ T1基因多态性与抗结核药物性肝损伤和精神分裂症之间联系的初步探索

PLoS One. 2015 Jun 5;10(6):e0128643. doi: 10.1371/journal.pone.0128643. eCollection 2015.

Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences.Pse-in-One：一个用于生成DNA、RNA和蛋白质序列各种伪组件模式的网络服务器。

Nucleic Acids Res. 2015 Jul 1;43(W1):W65-71. doi: 10.1093/nar/gkv458. Epub 2015 May 9.

repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects.repDNA：一个 Python 包，通过结合用户定义的物理化学性质和序列顺序效应，为 DNA 序列生成各种模式的特征向量。

Bioinformatics. 2015 Apr 15;31(8):1307-9. doi: 10.1093/bioinformatics/btu820. Epub 2014 Dec 10.

COSMIC: exploring the world's knowledge of somatic mutations in human cancer.COSMIC：探索全球关于人类癌症体细胞突变的知识。

Nucleic Acids Res. 2015 Jan;43(Database issue):D805-11. doi: 10.1093/nar/gku1075. Epub 2014 Oct 29.

PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions.PseKNC通用版：一个用于生成各种伪核苷酸组成模式的跨平台软件包。

Bioinformatics. 2015 Jan 1;31(1):119-20. doi: 10.1093/bioinformatics/btu602. Epub 2014 Sep 16.

Expanding the computational toolbox for mining cancer genomes.拓展癌症基因组挖掘的计算工具包。

Nat Rev Genet. 2014 Aug;15(8):556-70. doi: 10.1038/nrg3767. Epub 2014 Jul 8.

Whole-exome and targeted gene sequencing of gallbladder carcinoma identifies recurrent mutations in the ErbB pathway.胆囊癌的全外显子组和靶向基因测序鉴定出 ErbB 通路中的反复突变。

Nat Genet. 2014 Aug;46(8):872-6. doi: 10.1038/ng.3030. Epub 2014 Jul 6.

Toward better understanding of artifacts in variant calling from high-coverage samples.为了更好地理解高覆盖样本中变体调用中的伪影。

Bioinformatics. 2014 Oct 15;30(20):2843-51. doi: 10.1093/bioinformatics/btu356. Epub 2014 Jun 27.

SMaSH: a benchmarking toolkit for human genome variant calling.SMaSH：一种用于人类基因组变异检测的基准测试工具包。

Bioinformatics. 2014 Oct;30(19):2787-95. doi: 10.1093/bioinformatics/btu345. Epub 2014 Jun 3.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于不同肿瘤下一代测序深度数据的体细胞点突变检测工具的深入比较。

In-depth comparison of somatic point mutation callers based on different tumor next-generation sequencing depth data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献