• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

变异元调用器:用于基于定量、精确性筛选的变异调用流程的自动融合。

VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering.

作者信息

Gézsi András, Bolgár Bence, Marx Péter, Sarkozy Peter, Szalai Csaba, Antal Péter

机构信息

Department of Genetics, Cell- and Immunobiology, Semmelweis University, Nagyvárad tér 4, Budapest, H-1089, Hungary.

Department of Measurement and Information Systems, Budapest University of Technology and Economics, Magyar tudósok krt. 2, Budapest, H-1117, Hungary.

出版信息

BMC Genomics. 2015 Oct 28;16:875. doi: 10.1186/s12864-015-2050-y.

DOI:10.1186/s12864-015-2050-y
PMID:26510841
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4625715/
Abstract

BACKGROUND

The low concordance between different variant calling methods still poses a challenge for the wide-spread application of next-generation sequencing in research and clinical practice. A wide range of variant annotations can be used for filtering call sets in order to improve the precision of the variant calls, but the choice of the appropriate filtering thresholds is not straightforward. Variant quality score recalibration provides an alternative solution to hard filtering, but it requires large-scale, genomic data.

RESULTS

We evaluated germline variant calling pipelines based on BWA and Bowtie 2 aligners in combination with GATK UnifiedGenotyper, GATK HaplotypeCaller, FreeBayes and SAMtools variant callers, using simulated and real benchmark sequencing data (NA12878 with Illumina Platinum Genomes). We argue that these pipelines are not merely discordant, but they extract complementary useful information. We introduce VariantMetaCaller to test the hypothesis that the automated fusion of measurement related information allows better performance than the recommended hard-filtering settings or recalibration and the fusion of the individual call sets without using annotations. VariantMetaCaller uses Support Vector Machines to combine multiple information sources generated by variant calling pipelines and estimates probabilities of variants. This novel method had significantly higher sensitivity and precision than the individual variant callers in all target region sizes, ranging from a few hundred kilobases to whole exomes. We also demonstrated that VariantMetaCaller supports a quantitative, precision based filtering of variants under wider conditions. Specifically, the computed probabilities of the variants can be used to order the variants, and for a given threshold, probabilities can be used to estimate precision. Precision then can be directly translated to the number of true called variants, or equivalently, to the number of false calls, which allows finding problem-specific balance between sensitivity and precision.

CONCLUSIONS

VariantMetaCaller can be applied to small target regions and whole exomes as well, and it can be used in cases of organisms for which highly accurate variant call sets are not yet available, therefore it can be a viable alternative to hard filtering in cases where variant quality score recalibration cannot be used. VariantMetaCaller is freely available at http://bioinformatics.mit.bme.hu/VariantMetaCaller .

摘要

背景

不同变异检测方法之间的低一致性仍然对下一代测序技术在研究和临床实践中的广泛应用构成挑战。可以使用多种变异注释来过滤检测集,以提高变异检测的精度,但选择合适的过滤阈值并非易事。变异质量分数重新校准为硬过滤提供了一种替代解决方案,但它需要大规模的基因组数据。

结果

我们使用模拟和真实的基准测序数据(来自Illumina Platinum Genomes的NA12878),评估了基于BWA和Bowtie 2比对器,并结合GATK UnifiedGenotyper、GATK HaplotypeCaller、FreeBayes和SAMtools变异检测工具的种系变异检测流程。我们认为这些流程不仅不一致,而且它们提取了互补的有用信息。我们引入了VariantMetaCaller来检验这样一个假设:与测量相关信息的自动融合比推荐的硬过滤设置或重新校准以及不使用注释的单个检测集融合具有更好的性能。VariantMetaCaller使用支持向量机来组合变异检测流程生成的多个信息源,并估计变异的概率。在从几百千碱基到整个外显子组的所有目标区域大小中,这种新方法的灵敏度和精度都显著高于单个变异检测工具。我们还证明了VariantMetaCaller在更广泛的条件下支持基于精度的变异定量过滤。具体而言,计算出的变异概率可用于对变异进行排序,对于给定的阈值,概率可用于估计精度。然后精度可以直接转化为真正检测到的变异数量,或者等效地转化为错误检测数量,可以在灵敏度和精度之间找到针对特定问题的平衡。

结论

VariantMetaCaller可应用于小目标区域和整个外显子组,也可用于尚未有高精度变异检测集的生物体的情况,因此在无法使用变异质量分数重新校准的情况下,它可以成为硬过滤的可行替代方法。VariantMetaCaller可在http://bioinformatics.mit.bme.hu/VariantMetaCaller免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ed7/4625715/0ea666ff8e40/12864_2015_2050_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ed7/4625715/f1e581aac2d0/12864_2015_2050_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ed7/4625715/07c6af63293b/12864_2015_2050_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ed7/4625715/7157eb09a59b/12864_2015_2050_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ed7/4625715/0ea666ff8e40/12864_2015_2050_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ed7/4625715/f1e581aac2d0/12864_2015_2050_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ed7/4625715/07c6af63293b/12864_2015_2050_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ed7/4625715/7157eb09a59b/12864_2015_2050_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ed7/4625715/0ea666ff8e40/12864_2015_2050_Fig4_HTML.jpg

相似文献

1
VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering.变异元调用器:用于基于定量、精确性筛选的变异调用流程的自动融合。
BMC Genomics. 2015 Oct 28;16:875. doi: 10.1186/s12864-015-2050-y.
2
Impact of post-alignment processing in variant discovery from whole exome data.全外显子数据变异发现中比对后处理的影响
BMC Bioinformatics. 2016 Oct 3;17(1):403. doi: 10.1186/s12859-016-1279-z.
3
Detailed simulation of cancer exome sequencing data reveals differences and common limitations of variant callers.癌症外显子组测序数据的详细模拟揭示了变异检测工具的差异和常见局限性。
BMC Bioinformatics. 2017 Jan 3;18(1):8. doi: 10.1186/s12859-016-1417-7.
4
Variant callers for next-generation sequencing data: a comparison study.下一代测序数据的变异调用者:一项比较研究。
PLoS One. 2013 Sep 27;8(9):e75619. doi: 10.1371/journal.pone.0075619. eCollection 2013.
5
GATK hard filtering: tunable parameters to improve variant calling for next generation sequencing targeted gene panel data.GATK严格过滤:用于改进针对下一代测序靶向基因panel数据的变异检测的可调参数。
BMC Bioinformatics. 2017 Mar 23;18(Suppl 5):119. doi: 10.1186/s12859-017-1537-8.
6
An analytical workflow for accurate variant discovery in highly divergent regions.一种用于在高度分化区域进行准确变异发现的分析流程。
BMC Genomics. 2016 Sep 2;17(1):703. doi: 10.1186/s12864-016-3045-z.
7
From Wet-Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing.从湿实验室到变异:全基因组和全外显子组测序的生物信息学流程的一致性和速度
Hum Mutat. 2016 Dec;37(12):1263-1271. doi: 10.1002/humu.23114. Epub 2016 Sep 26.
8
Validation and assessment of variant calling pipelines for next-generation sequencing.下一代测序变异检测流程的验证与评估
Hum Genomics. 2014 Jul 30;8(1):14. doi: 10.1186/1479-7364-8-14.
9
Systematic comparison of variant calling pipelines using gold standard personal exome variants.使用金标准个人外显子变体对变异检测流程进行系统比较。
Sci Rep. 2015 Dec 7;5:17875. doi: 10.1038/srep17875.
10
Detailed comparison of two popular variant calling packages for exome and targeted exon studies.详细比较两种用于外显子组和靶向外显子研究的流行变异调用包。
PeerJ. 2014 Sep 30;2:e600. doi: 10.7717/peerj.600. eCollection 2014.

引用本文的文献

1
Exploring pharmacogenetic factors influencing hydroxyurea response in tanzanian sickle cell disease patients: a genomic medicine approach.探索影响坦桑尼亚镰状细胞病患者羟基脲反应的药物遗传学因素:一种基因组医学方法。
Pharmacogenomics J. 2025 Apr 23;25(3):11. doi: 10.1038/s41397-025-00372-3.
2
Genetic Modifiers of Sickle Cell Anemia Phenotype in a Cohort of Angolan Children.安哥拉儿童队列中镰状细胞贫血表型的遗传修饰因子
Genes (Basel). 2024 Apr 8;15(4):469. doi: 10.3390/genes15040469.
3
Simple combination of multiple somatic variant callers to increase accuracy.

本文引用的文献

1
From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline.从FastQ数据到高可信度变异检测:基因组分析工具包最佳实践流程
Curr Protoc Bioinformatics. 2013;43(1110):11.10.1-11.10.33. doi: 10.1002/0471250953.bi1110s43.
2
Lighter: fast and memory-efficient sequencing error correction without counting.Lighter:无需计数即可实现快速且内存高效的测序错误校正。
Genome Biol. 2014;15(11):509. doi: 10.1186/s13059-014-0509-9.
3
Validation and assessment of variant calling pipelines for next-generation sequencing.
多种体细胞变异 caller 的简单组合可提高准确性。
Sci Rep. 2023 May 25;13(1):8463. doi: 10.1038/s41598-023-34925-y.
4
Performance comparisons between clustering models for reconstructing NGS results from technical replicates.用于从技术重复样本中重建二代测序结果的聚类模型之间的性能比较。
Front Genet. 2023 Mar 16;14:1148147. doi: 10.3389/fgene.2023.1148147. eCollection 2023.
5
A whole genome sequencing approach to anterior cruciate ligament rupture-a twin study in two unrelated families.全基因组测序方法在交叉韧带断裂中的应用——两个无关联家族的双胞胎研究。
PLoS One. 2022 Oct 6;17(10):e0274354. doi: 10.1371/journal.pone.0274354. eCollection 2022.
6
Comparison of seven SNP calling pipelines for the next-generation sequencing data of chickens.比较用于鸡下一代测序数据的七种 SNP 调用管道。
PLoS One. 2022 Jan 31;17(1):e0262574. doi: 10.1371/journal.pone.0262574. eCollection 2022.
7
Chromatin Velocity reveals epigenetic dynamics by single-cell profiling of heterochromatin and euchromatin.染色质流速通过单细胞分析异染色质和常染色质来揭示表观遗传动态。
Nat Biotechnol. 2022 Feb;40(2):235-244. doi: 10.1038/s41587-021-01031-1. Epub 2021 Oct 11.
8
The impact of post-alignment processing procedures on whole-exome sequencing data.比对后处理程序对全外显子组测序数据的影响。
Genet Mol Biol. 2020 Nov 13;43(4):e20200047. doi: 10.1590/1678-4685-GMB-2020-0047. eCollection 2020.
9
Genetic modifiers of long-term survival in sickle cell anemia.镰状细胞贫血长期生存的基因修饰因子
Clin Transl Med. 2020 Aug;10(4):e152. doi: 10.1002/ctm2.152.
10
Identifying genetic variants and pathways associated with extreme levels of fetal hemoglobin in sickle cell disease in Tanzania.鉴定与坦桑尼亚镰状细胞病中胎儿血红蛋白极端水平相关的遗传变异和途径。
BMC Med Genet. 2020 Jun 5;21(1):125. doi: 10.1186/s12881-020-01059-1.
下一代测序变异检测流程的验证与评估
Hum Genomics. 2014 Jul 30;8(1):14. doi: 10.1186/1479-7364-8-14.
4
Next-generation sequencing: a change of paradigm in molecular diagnostic validation.下一代测序:分子诊断验证的范式转变。
J Pathol. 2014 Sep;234(1):5-10. doi: 10.1002/path.4365.
5
BAYSIC: a Bayesian method for combining sets of genome variants with improved specificity and sensitivity.BAYSIC:一种用于组合基因组变异集的贝叶斯方法,可提高特异性和灵敏度。
BMC Bioinformatics. 2014 Apr 12;15:104. doi: 10.1186/1471-2105-15-104.
6
Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals.评估全基因组测序个体中单核苷酸变异检测和基因型调用。
Bioinformatics. 2014 Jun 15;30(12):1707-13. doi: 10.1093/bioinformatics/btu067. Epub 2014 Feb 19.
7
Comprehensive analysis to improve the validation rate for single nucleotide variants detected by next-generation sequencing.综合分析以提高通过下一代测序检测到的单核苷酸变异的验证率。
PLoS One. 2014 Jan 29;9(1):e86664. doi: 10.1371/journal.pone.0086664. eCollection 2014.
8
BLESS: bloom filter-based error correction solution for high-throughput sequencing reads.BLESS:基于布隆过滤器的高通量测序读错误纠正解决方案。
Bioinformatics. 2014 May 15;30(10):1354-62. doi: 10.1093/bioinformatics/btu030. Epub 2014 Jan 21.
9
Variant callers for next-generation sequencing data: a comparison study.下一代测序数据的变异调用者:一项比较研究。
PLoS One. 2013 Sep 27;8(9):e75619. doi: 10.1371/journal.pone.0075619. eCollection 2013.
10
eXtasy: variant prioritization by genomic data fusion.eXtasy:通过基因组数据融合进行变体优先级排序。
Nat Methods. 2013 Nov;10(11):1083-4. doi: 10.1038/nmeth.2656. Epub 2013 Sep 29.