Suppr超能文献

基于集合论的三种不同靶向测序变异 caller 的基准测试。

Set-theory based benchmarking of three different variant callers for targeted sequencing.

机构信息

Centro de Investigación en Enfermedades Tropicales (CIET) and Facultad de Microbiología, Universidad de Costa Rica (UCR), San José, Costa Rica.

Centro de Investigaciones en Hematología y Transtornos Afines (CIHATA), Universidad de Costa Rica (UCR), San José, Costa Rica.

出版信息

BMC Bioinformatics. 2021 Jan 7;22(1):20. doi: 10.1186/s12859-020-03926-3.

Abstract

BACKGROUND

Next generation sequencing (NGS) technologies have improved the study of hereditary diseases. Since the evaluation of bioinformatics pipelines is not straightforward, NGS demands effective strategies to analyze data that is of paramount relevance for decision making under a clinical scenario. According to the benchmarking framework of the Global Alliance for Genomics and Health (GA4GH), we implemented a new simple and user-friendly set-theory based method to assess variant callers using a gold standard variant set and high confidence regions. As model, we used TruSight Cardio kit sequencing data of the reference genome NA12878. This targeted sequencing kit is used to identify variants in key genes related to Inherited Cardiac Conditions (ICCs), a group of cardiovascular diseases with high rates of morbidity and mortality.

RESULTS

We implemented and compared three variant calling pipelines (Isaac, Freebayes, and VarScan). Performance metrics using our set-theory approach showed high-resolution pipelines and revealed: (1) a perfect recall of 1.000 for all three pipelines, (2) very high precision values, i.e. 0.987 for Freebayes, 0.928 for VarScan, and 1.000 for Isaac, when compared with the reference material, and (3) a ROC curve analysis with AUC > 0.94 for all cases. Moreover, significant differences were obtained between the three pipelines. In general, results indicate that the three pipelines were able to recognize the expected variants in the gold standard data set.

CONCLUSIONS

Our set-theory approach to calculate metrics was able to identify the expected ICCs related variants by the three selected pipelines, but results were completely dependent on the algorithms. We emphasize the importance to assess pipelines using gold standard materials to achieve the most reliable results for clinical application.

摘要

背景

下一代测序(NGS)技术提高了遗传性疾病的研究水平。由于生物信息学管道的评估并不简单,因此 NGS 需要有效的策略来分析数据,这些数据对于临床情况下的决策至关重要。根据全球基因组和健康联盟(GA4GH)的基准框架,我们采用了一种新的简单易用的基于集合论的方法,使用黄金标准变异集和高置信区来评估变异调用者。作为模型,我们使用了参考基因组 NA12878 的 TruSight Cardio 试剂盒测序数据。该靶向测序试剂盒用于识别与遗传性心脏病(ICC)相关的关键基因中的变异,ICC 是一组具有高发病率和死亡率的心血管疾病。

结果

我们实现并比较了三种变异调用管道(Isaac、Freebayes 和 VarScan)。使用我们的集合论方法的性能指标显示了高分辨率管道,并揭示了:(1)所有三种管道的完美召回率为 1.000,(2)当与参考材料相比时,非常高的精度值,即 Freebayes 为 0.987、VarScan 为 0.928 和 Isaac 为 1.000,(3)所有情况下的 ROC 曲线分析 AUC>0.94。此外,三种管道之间存在显著差异。总体而言,结果表明三种管道都能够识别黄金标准数据集中预期的变体。

结论

我们用于计算指标的集合论方法能够通过三种选定的管道识别预期的 ICC 相关变体,但结果完全依赖于算法。我们强调使用黄金标准材料评估管道的重要性,以实现临床应用中最可靠的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c034/7791862/19414821b92c/12859_2020_3926_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验