• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

cascAGS:在缺乏金标准的情况下对人类基因组数据单核苷酸多态性(SNP)检测方法的比较分析

cascAGS: Comparative Analysis of SNP Calling Methods for Human Genome Data in the Absence of Gold Standard.

作者信息

Song Qianqian, Hu Taobo, Liang Baosheng, Li Shihai, Li Yang, Wu Jinbo, Wang Shu, Zhou Xiaohua

机构信息

Department of Biostatistics, School of Public Health, Peking University, Beijing, 100083, China.

Department of Breast Surgery, Peking University People's Hospital, Beijing, 100044, China.

出版信息

Interdiscip Sci. 2025 Mar;17(1):1-11. doi: 10.1007/s12539-024-00653-8. Epub 2024 Oct 23.

DOI:10.1007/s12539-024-00653-8
PMID:39443427
Abstract

The development of third-generation sequencing has accelerated the boom of single nucleotide polymorphism (SNP) calling methods, but evaluating accuracy remains challenging owing to the absence of the SNP gold standard. The definitions for without-gold-standard and performance metrics and their estimation are urgently needed. Additionally, the possible correlations between different SNP loci should also be further explored. To address these challenges, we first introduced the concept of a gold standard and imperfect gold standard under the consistency framework and gave the corresponding definitions of sensitivity and specificity. A latent class model (LCM) was established to estimate the sensitivity and specificity of callers. Furthermore, we incorporated different dependency structures into LCM to investigate their impact on sensitivity and specificity. The performance of LCM was illustrated by comparing the accuracy of BCFtools, DeepVariant, FreeBayes, and GATK on various datasets. Through estimations across multiple datasets, the results indicate that LCM is well-suitable for evaluating callers without the SNP gold standard, and accurate inclusion of the dependency between variations is crucial for better performance ranking. DeepVariant has a higher sum of sensitivity and specificity than other callers, followed by GATK and BCFtools. FreeBayes has low sensitivity but high specificity. Notably, appropriate sequencing coverage is another important factor for precise callers' evaluation. Most importantly, a web interface for assessing and comparing different callers was developed to simplify the evaluation process.

摘要

第三代测序技术的发展加速了单核苷酸多态性(SNP)检测方法的蓬勃发展,但由于缺乏SNP金标准,评估准确性仍然具有挑战性。迫切需要无金标准和性能指标的定义及其估计方法。此外,不同SNP位点之间可能的相关性也应进一步探索。为应对这些挑战,我们首先在一致性框架下引入了金标准和不完美金标准的概念,并给出了相应的敏感性和特异性定义。建立了一个潜在类别模型(LCM)来估计检测工具的敏感性和特异性。此外,我们将不同的依赖结构纳入LCM,以研究它们对敏感性和特异性的影响。通过比较BCFtools、DeepVariant、FreeBayes和GATK在各种数据集上的准确性来说明LCM的性能。通过对多个数据集的估计,结果表明LCM非常适合在没有SNP金标准的情况下评估检测工具,准确纳入变异之间的依赖性对于更好的性能排名至关重要。DeepVariant的敏感性和特异性之和高于其他检测工具,其次是GATK和BCFtools。FreeBayes的敏感性较低但特异性较高。值得注意的是,适当的测序覆盖度是精确评估检测工具的另一个重要因素。最重要的是,开发了一个用于评估和比较不同检测工具的网络界面,以简化评估过程。

相似文献

1
cascAGS: Comparative Analysis of SNP Calling Methods for Human Genome Data in the Absence of Gold Standard.cascAGS:在缺乏金标准的情况下对人类基因组数据单核苷酸多态性(SNP)检测方法的比较分析
Interdiscip Sci. 2025 Mar;17(1):1-11. doi: 10.1007/s12539-024-00653-8. Epub 2024 Oct 23.
2
Comparison of seven SNP calling pipelines for the next-generation sequencing data of chickens.比较用于鸡下一代测序数据的七种 SNP 调用管道。
PLoS One. 2022 Jan 31;17(1):e0262574. doi: 10.1371/journal.pone.0262574. eCollection 2022.
3
Variant callers for next-generation sequencing data: a comparison study.下一代测序数据的变异调用者:一项比较研究。
PLoS One. 2013 Sep 27;8(9):e75619. doi: 10.1371/journal.pone.0075619. eCollection 2013.
4
Benchmarking variant callers in next-generation and third-generation sequencing analysis.在新一代和第三代测序分析中对变异调用程序进行基准测试。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa148.
5
Impact of post-alignment processing in variant discovery from whole exome data.全外显子数据变异发现中比对后处理的影响
BMC Bioinformatics. 2016 Oct 3;17(1):403. doi: 10.1186/s12859-016-1279-z.
6
Comparing a few SNP calling algorithms using low-coverage sequencing data.比较几种使用低覆盖度测序数据的 SNP calling 算法。
BMC Bioinformatics. 2013 Sep 17;14:274. doi: 10.1186/1471-2105-14-274.
7
Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery.系统基准测试最先进的变异调用管道,确定影响编码序列变异发现准确性的主要因素。
BMC Genomics. 2022 Feb 22;23(1):155. doi: 10.1186/s12864-022-08365-3.
8
Comparing the performance of selected variant callers using synthetic data and genome segmentation.使用合成数据和基因组分割比较选定变异调用程序的性能。
BMC Bioinformatics. 2018 Nov 19;19(1):429. doi: 10.1186/s12859-018-2440-7.
9
Comparison of GATK and DeepVariant by trio sequencing.基于 trio 测序的 GATK 和 DeepVariant 比较。
Sci Rep. 2022 Feb 2;12(1):1809. doi: 10.1038/s41598-022-05833-4.
10
Comparison among three variant callers and assessment of the accuracy of imputation from SNP array data to whole-genome sequence level in chicken.鸡中三种变异检测工具的比较以及从SNP芯片数据到全基因组序列水平的填充准确性评估。
BMC Genomics. 2015 Oct 21;16:824. doi: 10.1186/s12864-015-2059-2.

本文引用的文献

1
A draft human pangenome reference.人类泛基因组参考草图。
Nature. 2023 May;617(7960):312-324. doi: 10.1038/s41586-023-05896-x. Epub 2023 May 10.
2
Variant calling and benchmarking in an era of complete human genome sequences.全基因组序列时代的变异调用和基准测试。
Nat Rev Genet. 2023 Jul;24(7):464-483. doi: 10.1038/s41576-023-00590-0. Epub 2023 Apr 14.
3
A complete reference genome improves analysis of human genetic variation.完整的参考基因组提高了人类遗传变异分析的能力。
Science. 2022 Apr;376(6588):eabl3533. doi: 10.1126/science.abl3533. Epub 2022 Apr 1.
4
Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery.系统基准测试最先进的变异调用管道,确定影响编码序列变异发现准确性的主要因素。
BMC Genomics. 2022 Feb 22;23(1):155. doi: 10.1186/s12864-022-08365-3.
5
Comparison of GATK and DeepVariant by trio sequencing.基于 trio 测序的 GATK 和 DeepVariant 比较。
Sci Rep. 2022 Feb 2;12(1):1809. doi: 10.1038/s41598-022-05833-4.
6
Comparison of seven SNP calling pipelines for the next-generation sequencing data of chickens.比较用于鸡下一代测序数据的七种 SNP 调用管道。
PLoS One. 2022 Jan 31;17(1):e0262574. doi: 10.1371/journal.pone.0262574. eCollection 2022.
7
Third-Generation Sequencing: The Spearhead towards the Radical Transformation of Modern Genomics.第三代测序:引领现代基因组学彻底变革的先锋
Life (Basel). 2021 Dec 26;12(1):30. doi: 10.3390/life12010030.
8
WDPCP Modulates Cilia Beating Through the MAPK/ERK Pathway in Chronic Rhinosinusitis With Nasal Polyps.WDPCP通过丝裂原活化蛋白激酶/细胞外信号调节激酶(MAPK/ERK)信号通路调控伴鼻息肉的慢性鼻-鼻窦炎中的纤毛摆动
Front Cell Dev Biol. 2021 Feb 1;8:630340. doi: 10.3389/fcell.2020.630340. eCollection 2020.
9
Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing.Longshot 可通过单分子长读测序对二倍体基因组进行准确的变异调用。
Nat Commun. 2019 Oct 11;10(1):4660. doi: 10.1038/s41467-019-12493-y.
10
Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers.跨多种下一代测序仪的种系变异调用管道的系统比较。
Sci Rep. 2019 Jun 27;9(1):9345. doi: 10.1038/s41598-019-45835-3.