Song Qianqian, Hu Taobo, Liang Baosheng, Li Shihai, Li Yang, Wu Jinbo, Wang Shu, Zhou Xiaohua
Department of Biostatistics, School of Public Health, Peking University, Beijing, 100083, China.
Department of Breast Surgery, Peking University People's Hospital, Beijing, 100044, China.
Interdiscip Sci. 2025 Mar;17(1):1-11. doi: 10.1007/s12539-024-00653-8. Epub 2024 Oct 23.
The development of third-generation sequencing has accelerated the boom of single nucleotide polymorphism (SNP) calling methods, but evaluating accuracy remains challenging owing to the absence of the SNP gold standard. The definitions for without-gold-standard and performance metrics and their estimation are urgently needed. Additionally, the possible correlations between different SNP loci should also be further explored. To address these challenges, we first introduced the concept of a gold standard and imperfect gold standard under the consistency framework and gave the corresponding definitions of sensitivity and specificity. A latent class model (LCM) was established to estimate the sensitivity and specificity of callers. Furthermore, we incorporated different dependency structures into LCM to investigate their impact on sensitivity and specificity. The performance of LCM was illustrated by comparing the accuracy of BCFtools, DeepVariant, FreeBayes, and GATK on various datasets. Through estimations across multiple datasets, the results indicate that LCM is well-suitable for evaluating callers without the SNP gold standard, and accurate inclusion of the dependency between variations is crucial for better performance ranking. DeepVariant has a higher sum of sensitivity and specificity than other callers, followed by GATK and BCFtools. FreeBayes has low sensitivity but high specificity. Notably, appropriate sequencing coverage is another important factor for precise callers' evaluation. Most importantly, a web interface for assessing and comparing different callers was developed to simplify the evaluation process.
第三代测序技术的发展加速了单核苷酸多态性(SNP)检测方法的蓬勃发展,但由于缺乏SNP金标准,评估准确性仍然具有挑战性。迫切需要无金标准和性能指标的定义及其估计方法。此外,不同SNP位点之间可能的相关性也应进一步探索。为应对这些挑战,我们首先在一致性框架下引入了金标准和不完美金标准的概念,并给出了相应的敏感性和特异性定义。建立了一个潜在类别模型(LCM)来估计检测工具的敏感性和特异性。此外,我们将不同的依赖结构纳入LCM,以研究它们对敏感性和特异性的影响。通过比较BCFtools、DeepVariant、FreeBayes和GATK在各种数据集上的准确性来说明LCM的性能。通过对多个数据集的估计,结果表明LCM非常适合在没有SNP金标准的情况下评估检测工具,准确纳入变异之间的依赖性对于更好的性能排名至关重要。DeepVariant的敏感性和特异性之和高于其他检测工具,其次是GATK和BCFtools。FreeBayes的敏感性较低但特异性较高。值得注意的是,适当的测序覆盖度是精确评估检测工具的另一个重要因素。最重要的是,开发了一个用于评估和比较不同检测工具的网络界面,以简化评估过程。