• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于集合论的三种不同靶向测序变异 caller 的基准测试。

Set-theory based benchmarking of three different variant callers for targeted sequencing.

机构信息

Centro de Investigación en Enfermedades Tropicales (CIET) and Facultad de Microbiología, Universidad de Costa Rica (UCR), San José, Costa Rica.

Centro de Investigaciones en Hematología y Transtornos Afines (CIHATA), Universidad de Costa Rica (UCR), San José, Costa Rica.

出版信息

BMC Bioinformatics. 2021 Jan 7;22(1):20. doi: 10.1186/s12859-020-03926-3.

DOI:10.1186/s12859-020-03926-3
PMID:33413082
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7791862/
Abstract

BACKGROUND

Next generation sequencing (NGS) technologies have improved the study of hereditary diseases. Since the evaluation of bioinformatics pipelines is not straightforward, NGS demands effective strategies to analyze data that is of paramount relevance for decision making under a clinical scenario. According to the benchmarking framework of the Global Alliance for Genomics and Health (GA4GH), we implemented a new simple and user-friendly set-theory based method to assess variant callers using a gold standard variant set and high confidence regions. As model, we used TruSight Cardio kit sequencing data of the reference genome NA12878. This targeted sequencing kit is used to identify variants in key genes related to Inherited Cardiac Conditions (ICCs), a group of cardiovascular diseases with high rates of morbidity and mortality.

RESULTS

We implemented and compared three variant calling pipelines (Isaac, Freebayes, and VarScan). Performance metrics using our set-theory approach showed high-resolution pipelines and revealed: (1) a perfect recall of 1.000 for all three pipelines, (2) very high precision values, i.e. 0.987 for Freebayes, 0.928 for VarScan, and 1.000 for Isaac, when compared with the reference material, and (3) a ROC curve analysis with AUC > 0.94 for all cases. Moreover, significant differences were obtained between the three pipelines. In general, results indicate that the three pipelines were able to recognize the expected variants in the gold standard data set.

CONCLUSIONS

Our set-theory approach to calculate metrics was able to identify the expected ICCs related variants by the three selected pipelines, but results were completely dependent on the algorithms. We emphasize the importance to assess pipelines using gold standard materials to achieve the most reliable results for clinical application.

摘要

背景

下一代测序(NGS)技术提高了遗传性疾病的研究水平。由于生物信息学管道的评估并不简单,因此 NGS 需要有效的策略来分析数据,这些数据对于临床情况下的决策至关重要。根据全球基因组和健康联盟(GA4GH)的基准框架,我们采用了一种新的简单易用的基于集合论的方法,使用黄金标准变异集和高置信区来评估变异调用者。作为模型,我们使用了参考基因组 NA12878 的 TruSight Cardio 试剂盒测序数据。该靶向测序试剂盒用于识别与遗传性心脏病(ICC)相关的关键基因中的变异,ICC 是一组具有高发病率和死亡率的心血管疾病。

结果

我们实现并比较了三种变异调用管道(Isaac、Freebayes 和 VarScan)。使用我们的集合论方法的性能指标显示了高分辨率管道,并揭示了:(1)所有三种管道的完美召回率为 1.000,(2)当与参考材料相比时,非常高的精度值,即 Freebayes 为 0.987、VarScan 为 0.928 和 Isaac 为 1.000,(3)所有情况下的 ROC 曲线分析 AUC>0.94。此外,三种管道之间存在显著差异。总体而言,结果表明三种管道都能够识别黄金标准数据集中预期的变体。

结论

我们用于计算指标的集合论方法能够通过三种选定的管道识别预期的 ICC 相关变体,但结果完全依赖于算法。我们强调使用黄金标准材料评估管道的重要性,以实现临床应用中最可靠的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c034/7791862/6af0dee5148f/12859_2020_3926_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c034/7791862/19414821b92c/12859_2020_3926_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c034/7791862/216348a6aa35/12859_2020_3926_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c034/7791862/fe1977f68f72/12859_2020_3926_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c034/7791862/0b13be0d8451/12859_2020_3926_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c034/7791862/0f8217f14358/12859_2020_3926_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c034/7791862/6af0dee5148f/12859_2020_3926_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c034/7791862/19414821b92c/12859_2020_3926_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c034/7791862/216348a6aa35/12859_2020_3926_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c034/7791862/fe1977f68f72/12859_2020_3926_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c034/7791862/0b13be0d8451/12859_2020_3926_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c034/7791862/0f8217f14358/12859_2020_3926_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c034/7791862/6af0dee5148f/12859_2020_3926_Fig6_HTML.jpg

相似文献

1
Set-theory based benchmarking of three different variant callers for targeted sequencing.基于集合论的三种不同靶向测序变异 caller 的基准测试。
BMC Bioinformatics. 2021 Jan 7;22(1):20. doi: 10.1186/s12859-020-03926-3.
2
Variant callers for next-generation sequencing data: a comparison study.下一代测序数据的变异调用者:一项比较研究。
PLoS One. 2013 Sep 27;8(9):e75619. doi: 10.1371/journal.pone.0075619. eCollection 2013.
3
Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery.系统基准测试最先进的变异调用管道,确定影响编码序列变异发现准确性的主要因素。
BMC Genomics. 2022 Feb 22;23(1):155. doi: 10.1186/s12864-022-08365-3.
4
Benchmarking workflows to assess performance and suitability of germline variant calling pipelines in clinical diagnostic assays.评估用于临床诊断检测的种系变异calling 管道性能和适用性的基准测试工作流程。
BMC Bioinformatics. 2021 Feb 24;22(1):85. doi: 10.1186/s12859-020-03934-3.
5
Systematic comparison of variant calling pipelines using gold standard personal exome variants.使用金标准个人外显子变体对变异检测流程进行系统比较。
Sci Rep. 2015 Dec 7;5:17875. doi: 10.1038/srep17875.
6
Systematic comparison of variant calling pipelines of target genome sequencing cross multiple next-generation sequencers.跨多个下一代测序仪对目标基因组测序变异检测流程的系统比较。
Front Genet. 2024 Jan 4;14:1293974. doi: 10.3389/fgene.2023.1293974. eCollection 2023.
7
VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering.变异元调用器:用于基于定量、精确性筛选的变异调用流程的自动融合。
BMC Genomics. 2015 Oct 28;16:875. doi: 10.1186/s12864-015-2050-y.
8
Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers.跨多种下一代测序仪的种系变异调用管道的系统比较。
Sci Rep. 2019 Jun 27;9(1):9345. doi: 10.1038/s41598-019-45835-3.
9
Impact of post-alignment processing in variant discovery from whole exome data.全外显子数据变异发现中比对后处理的影响
BMC Bioinformatics. 2016 Oct 3;17(1):403. doi: 10.1186/s12859-016-1279-z.
10
From Wet-Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing.从湿实验室到变异:全基因组和全外显子组测序的生物信息学流程的一致性和速度
Hum Mutat. 2016 Dec;37(12):1263-1271. doi: 10.1002/humu.23114. Epub 2016 Sep 26.

引用本文的文献

1
Characterization and trans-generation dynamics of mitogene pool in the silver carp (Hypophthalmichthys molitrix).银鲫(Hypophthalmichthys molitrix)中促分裂原库的特征描述及其跨代动态。
G3 (Bethesda). 2024 Sep 4;14(9). doi: 10.1093/g3journal/jkae101.

本文引用的文献

1
Best practices for benchmarking germline small-variant calls in human genomes.人类基因组中小变异calls 的基准测试最佳实践。
Nat Biotechnol. 2019 May;37(5):555-560. doi: 10.1038/s41587-019-0054-x. Epub 2019 Mar 11.
2
Comparison of three variant callers for human whole genome sequencing.三种人类全基因组测序变异 caller 的比较。
Sci Rep. 2018 Dec 14;8(1):17851. doi: 10.1038/s41598-018-36177-7.
3
ClinVar: improving access to variant interpretations and supporting evidence.ClinVar:改善变异解读和支持证据的获取。
Nucleic Acids Res. 2018 Jan 4;46(D1):D1062-D1067. doi: 10.1093/nar/gkx1153.
4
Analysis and Annotation of Whole-Genome or Whole-Exome Sequencing Derived Variants for Clinical Diagnosis.用于临床诊断的全基因组或全外显子组测序衍生变异的分析与注释
Curr Protoc Hum Genet. 2017 Oct 18;95:9.24.1-9.24.28. doi: 10.1002/cphg.49.
5
VCF.Filter: interactive prioritization of disease-linked genetic variants from sequencing data.VCF.Filter:从测序数据中交互式优先考虑与疾病相关的遗传变异。
Nucleic Acids Res. 2017 Jul 3;45(W1):W567-W572. doi: 10.1093/nar/gkx425.
6
Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data.评估用于非配对下一代测序数据的变异调用工具。
Sci Rep. 2017 Feb 24;7:43169. doi: 10.1038/srep43169.
7
InterVar: Clinical Interpretation of Genetic Variants by the 2015 ACMG-AMP Guidelines.InterVar:依据2015年美国医学遗传学与基因组学学会(ACMG)-分子病理学协会(AMP)指南对基因变异进行临床解读
Am J Hum Genet. 2017 Feb 2;100(2):267-280. doi: 10.1016/j.ajhg.2017.01.004. Epub 2017 Jan 26.
8
A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree.通过对一个包含17名成员的三代家系进行测序,经遗传继承验证的540万个定相人类变异的参考数据集。
Genome Res. 2017 Jan;27(1):157-164. doi: 10.1101/gr.210500.116. Epub 2016 Nov 30.
9
Extensive sequencing of seven human genomes to characterize benchmark reference materials.对七个人类基因组进行广泛测序以表征基准参考材料。
Sci Data. 2016 Jun 7;3:160025. doi: 10.1038/sdata.2016.25.
10
Development of a Comprehensive Sequencing Assay for Inherited Cardiac Condition Genes.遗传性心脏病基因综合测序检测方法的开发
J Cardiovasc Transl Res. 2016 Feb;9(1):3-11. doi: 10.1007/s12265-016-9673-5. Epub 2016 Feb 17.