• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Bcftools mpileup 和 GATK HaplotypeCaller 在非人类物种变异调用中的评估。

The evaluation of Bcftools mpileup and GATK HaplotypeCaller for variant calling in non-human species.

机构信息

DGIMI, Univ Montpellier, INRAE, Montpellier, France.

出版信息

Sci Rep. 2022 Jul 5;12(1):11331. doi: 10.1038/s41598-022-15563-2.

DOI:10.1038/s41598-022-15563-2
PMID:35790846
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9256665/
Abstract

Identification of genetic variations is a central part of population and quantitative genomics studies based on high-throughput sequencing data. Even though popular variant callers such as Bcftools mpileup and GATK HaplotypeCaller were developed nearly 10 years ago, their performance is still largely unknown for non-human species. Here, we showed by benchmark analyses with a simulated insect population that Bcftools mpileup performs better than GATK HaplotypeCaller in terms of recovery rate and accuracy regardless of mapping software. The vast majority of false positives were observed from repeats, especially for GATK HaplotypeCaller. Variant scores calculated by GATK did not clearly distinguish true positives from false positives in the vast majority of cases, implying that hard-filtering with GATK could be challenging. These results suggest that Bcftools mpileup may be the first choice for non-human studies and that variants within repeats might have to be excluded for downstream analyses.

摘要

鉴定遗传变异是基于高通量测序数据的群体和数量基因组学研究的核心部分。尽管流行的变异调用程序,如 Bcftools mpileup 和 GATK HaplotypeCaller,已经开发了将近 10 年,但它们在非人类物种中的性能仍然很大程度上未知。在这里,我们通过对模拟昆虫种群的基准分析表明,无论使用哪种映射软件,Bcftools mpileup 在恢复率和准确性方面都优于 GATK HaplotypeCaller。绝大多数假阳性是从重复序列中观察到的,特别是对于 GATK HaplotypeCaller 而言。在绝大多数情况下,GATK 计算的变异得分不能清楚地区分真阳性和假阳性,这意味着使用 GATK 进行硬过滤可能具有挑战性。这些结果表明,Bcftools mpileup 可能是非人类研究的首选,并且重复序列内的变异可能需要在下游分析中排除。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd07/9256665/fee1bec92b57/41598_2022_15563_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd07/9256665/d453f06def72/41598_2022_15563_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd07/9256665/69b6342132da/41598_2022_15563_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd07/9256665/f93181a6b289/41598_2022_15563_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd07/9256665/6d5ff359c569/41598_2022_15563_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd07/9256665/5b455e0f1855/41598_2022_15563_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd07/9256665/fee1bec92b57/41598_2022_15563_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd07/9256665/d453f06def72/41598_2022_15563_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd07/9256665/69b6342132da/41598_2022_15563_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd07/9256665/f93181a6b289/41598_2022_15563_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd07/9256665/6d5ff359c569/41598_2022_15563_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd07/9256665/5b455e0f1855/41598_2022_15563_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd07/9256665/fee1bec92b57/41598_2022_15563_Fig6_HTML.jpg

相似文献

1
The evaluation of Bcftools mpileup and GATK HaplotypeCaller for variant calling in non-human species.Bcftools mpileup 和 GATK HaplotypeCaller 在非人类物种变异调用中的评估。
Sci Rep. 2022 Jul 5;12(1):11331. doi: 10.1038/s41598-022-15563-2.
2
Comparison of GATK and DeepVariant by trio sequencing.基于 trio 测序的 GATK 和 DeepVariant 比较。
Sci Rep. 2022 Feb 2;12(1):1809. doi: 10.1038/s41598-022-05833-4.
3
Detailed comparison of two popular variant calling packages for exome and targeted exon studies.详细比较两种用于外显子组和靶向外显子研究的流行变异调用包。
PeerJ. 2014 Sep 30;2:e600. doi: 10.7717/peerj.600. eCollection 2014.
4
VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering.变异元调用器:用于基于定量、精确性筛选的变异调用流程的自动融合。
BMC Genomics. 2015 Oct 28;16:875. doi: 10.1186/s12864-015-2050-y.
5
GATK hard filtering: tunable parameters to improve variant calling for next generation sequencing targeted gene panel data.GATK严格过滤:用于改进针对下一代测序靶向基因panel数据的变异检测的可调参数。
BMC Bioinformatics. 2017 Mar 23;18(Suppl 5):119. doi: 10.1186/s12859-017-1537-8.
6
FVC as an adaptive and accurate method for filtering variants from popular NGS analysis pipelines.FVC 是一种自适应且准确的方法,可用于从流行的 NGS 分析管道中筛选变体。
Commun Biol. 2022 Sep 16;5(1):975. doi: 10.1038/s42003-022-03397-7.
7
OVarFlow: a resource optimized GATK 4 based Open source Variant calling workFlow.OVarFlow:一种基于资源优化的 GATK4 的开源变异调用工作流程。
BMC Bioinformatics. 2021 Aug 13;22(1):402. doi: 10.1186/s12859-021-04317-y.
8
Benchmarking variant callers in next-generation and third-generation sequencing analysis.在新一代和第三代测序分析中对变异调用程序进行基准测试。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa148.
9
Evaluation of variant calling tools for large plant genome re-sequencing.评价用于大型植物基因组重测序的变异调用工具。
BMC Bioinformatics. 2020 Aug 17;21(1):360. doi: 10.1186/s12859-020-03704-1.
10
ADS-HCSpark: A scalable HaplotypeCaller leveraging adaptive data segmentation to accelerate variant calling on Spark.ADS-HCSpark:一种可扩展的基于 Spark 的单倍型调用程序,利用自适应数据分段来加速变异调用。
BMC Bioinformatics. 2019 Feb 14;20(1):76. doi: 10.1186/s12859-019-2665-0.

引用本文的文献

1
A 17.1 kb duplication downstream GATA6 is strongly associated with egg weight in chicken.GATA6下游17.1 kb的重复与鸡的蛋重密切相关。
BMC Genomics. 2025 Aug 20;26(1):765. doi: 10.1186/s12864-025-11888-0.
2
Learning-based parallel acceleration for HaplotypeCaller.基于学习的单倍型分型器并行加速技术
BMC Bioinformatics. 2025 Aug 20;26(1):217. doi: 10.1186/s12859-025-06242-w.
3
Progress in Our Understanding of the Cross-Protection Mechanism of CTV-VT No-SY Isolates Against Homologous SY Isolates.我们对CTV-VT无SY分离株针对同源SY分离株的交叉保护机制的理解进展。

本文引用的文献

1
Sharing genetic variants with the NGS pipeline is essential for effective genomic data sharing and reproducibility in health information exchange.与 NGS 管道共享遗传变异对于有效的基因组数据共享和健康信息交换中的可重复性至关重要。
Sci Rep. 2021 Jan 26;11(1):2268. doi: 10.1038/s41598-021-82006-9.
2
Simulation of African and non-African low and high coverage whole genome sequence data to assess variant calling approaches.模拟非洲和非非洲低覆盖度和高覆盖度全基因组序列数据,以评估变异调用方法。
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa366.
3
Comparison of Read Mapping and Variant Calling Tools for the Analysis of Plant NGS Data.
Pathogens. 2025 Jul 16;14(7):701. doi: 10.3390/pathogens14070701.
4
Mutagenesis of Highland barley (Hordeum vulgare L. Var. nudum) using nitrogen ion beam implantation: screening of phenotypic Var.ations and comparative transcriptome analysis.利用氮离子束注入诱变青稞(裸大麦):表型变异筛选及转录组比较分析
BMC Genomics. 2025 Jul 21;26(1):681. doi: 10.1186/s12864-025-11856-8.
5
Identification of genetic variations linked to buparvaquone resistance in Theileria annulata infecting dairy cattle in India.印度感染奶牛的环形泰勒虫中与丁胺喹啉抗性相关的基因变异鉴定。
PLoS One. 2025 Jul 18;20(7):e0326243. doi: 10.1371/journal.pone.0326243. eCollection 2025.
6
Alternative splicing contributes to plasticity and regulatory divergence in locally adapted house mice from the Americas.可变剪接有助于美洲本地适应的家鼠的可塑性和调控差异。
bioRxiv. 2025 Jul 7:2025.07.03.662996. doi: 10.1101/2025.07.03.662996.
7
Current Bioinformatics Tools in Precision Oncology.精准肿瘤学中的当前生物信息学工具
MedComm (2020). 2025 Jul 9;6(7):e70243. doi: 10.1002/mco2.70243. eCollection 2025 Jul.
8
Benchmarking Genomic Variant Calling Tools in Inbred Mouse Strains: Recommendations and Considerations.近交系小鼠品系中基因组变异检测工具的基准测试:建议与注意事项
bioRxiv. 2025 May 31:2025.05.28.656711. doi: 10.1101/2025.05.28.656711.
9
Root restriction accelerates genomic target identification in quinoa under controlled conditions.在可控条件下,根系限制加速了藜麦基因组靶点的鉴定。
Physiol Plant. 2025 Mar-Apr;177(2):e70223. doi: 10.1111/ppl.70223.
10
Single-cell RNA sequencing reveals important role of monocytes and macrophages during mucopolysaccharidosis treatment.单细胞RNA测序揭示了单核细胞和巨噬细胞在黏多糖贮积症治疗过程中的重要作用。
Sci Rep. 2025 Apr 10;15(1):12364. doi: 10.1038/s41598-025-97330-7.
用于植物NGS数据分析的读段比对和变异检测工具比较
Plants (Basel). 2020 Apr 2;9(4):439. doi: 10.3390/plants9040439.
4
Benchmarking variant identification tools for plant diversity discovery.植物多样性发现的变异识别工具基准测试。
BMC Genomics. 2019 Sep 9;20(1):701. doi: 10.1186/s12864-019-6057-7.
5
Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data.使用人类全外显子组测序和模拟数据评估变异调用管道的性能。
BMC Bioinformatics. 2019 Jun 17;20(1):342. doi: 10.1186/s12859-019-2928-9.
6
simuG: a general-purpose genome simulator.simuG:一种通用的基因组模拟程序。
Bioinformatics. 2019 Nov 1;35(21):4442-4444. doi: 10.1093/bioinformatics/btz424.
7
Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data.评估用于非配对下一代测序数据的变异调用工具。
Sci Rep. 2017 Feb 24;7:43169. doi: 10.1038/srep43169.
8
Simulating Next-Generation Sequencing Datasets from Empirical Mutation and Sequencing Models.根据经验性突变和测序模型模拟下一代测序数据集。
PLoS One. 2016 Nov 28;11(11):e0167047. doi: 10.1371/journal.pone.0167047. eCollection 2016.
9
SLiM 2: Flexible, Interactive Forward Genetic Simulations.SLiM 2:灵活、交互式正向遗传模拟。
Mol Biol Evol. 2017 Jan;34(1):230-240. doi: 10.1093/molbev/msw211. Epub 2016 Oct 3.
10
From Wet-Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing.从湿实验室到变异:全基因组和全外显子组测序的生物信息学流程的一致性和速度
Hum Mutat. 2016 Dec;37(12):1263-1271. doi: 10.1002/humu.23114. Epub 2016 Sep 26.