Suppr超能文献

比较用于……的基因组变异识别方案

Comparing genomic variant identification protocols for .

作者信息

Li Xiao, Muñoz José F, Gade Lalitha, Argimon Silvia, Bougnoux Marie-Elisabeth, Bowers Jolene R, Chow Nancy A, Cuesta Isabel, Farrer Rhys A, Maufrais Corinne, Monroy-Nieto Juan, Pradhan Dibyabhaba, Uehling Jessie, Vu Duong, Yeats Corin A, Aanensen David M, d'Enfert Christophe, Engelthaler David M, Eyre David W, Fisher Matthew C, Hagen Ferry, Meyer Wieland, Singh Gagandeep, Alastruey-Izquierdo Ana, Litvintseva Anastasia P, Cuomo Christina A

机构信息

Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.

Mycotic Diseases Branch, Centers for Disease Control and Prevention, US Department of Health and Human Services, Atlanta, GA, 30329, USA.

出版信息

Microb Genom. 2023 Apr;9(4). doi: 10.1099/mgen.0.000979.

Abstract

Genomic analyses are widely applied to epidemiological, population genetic and experimental studies of pathogenic fungi. A wide range of methods are employed to carry out these analyses, typically without including controls that gauge the accuracy of variant prediction. The importance of tracking outbreaks at a global scale has raised the urgency of establishing high-accuracy pipelines that generate consistent results between research groups. To evaluate currently employed methods for whole-genome variant detection and elaborate best practices for fungal pathogens, we compared how 14 independent variant calling pipelines performed across 35 isolates from 4 distinct clades and evaluated the performance of variant calling, single-nucleotide polymorphism (SNP) counts and phylogenetic inference results. Although these pipelines used different variant callers and filtering criteria, we found high overall agreement of SNPs from each pipeline. This concordance correlated with site quality, as SNPs discovered by a few pipelines tended to show lower mapping quality scores and depth of coverage than those recovered by all pipelines. We observed that the major differences between pipelines were due to variation in read trimming strategies, SNP calling methods and parameters, and downstream filtration criteria. We calculated specificity and sensitivity for each pipeline by aligning three isolates with chromosomal level assemblies and found that the GATK-based pipelines were well balanced between these metrics. Selection of trimming methods had a greater impact on SAMtools-based pipelines than those using GATK. Phylogenetic trees inferred by each pipeline showed high consistency at the clade level, but there was more variability between isolates from a single outbreak, with pipelines that used more stringent cutoffs having lower resolution. This project generated two truth datasets useful for routine benchmarking of variant calling, a consensus VCF of genotypes discovered by 10 or more pipelines across these 35 diverse isolates and variants for 2 samples identified from whole-genome alignments. This study provides a foundation for evaluating SNP calling pipelines and developing best practices for future fungal genomic studies.

摘要

基因组分析广泛应用于致病真菌的流行病学、群体遗传学和实验研究。开展这些分析采用了多种方法,通常未纳入评估变异预测准确性的对照。在全球范围内追踪疫情的重要性,提高了建立能在研究团队之间产生一致结果的高精度流程的紧迫性。为了评估当前用于全基因组变异检测的方法并阐述针对真菌病原体的最佳实践,我们比较了14个独立的变异检测流程在来自4个不同进化枝的35个分离株中的表现,并评估了变异检测、单核苷酸多态性(SNP)计数和系统发育推断结果的性能。尽管这些流程使用了不同的变异检测工具和过滤标准,但我们发现每个流程的SNP总体一致性较高。这种一致性与位点质量相关,因为少数流程发现的SNP往往比所有流程都能检测到的SNP显示出更低的比对质量分数和覆盖深度。我们观察到流程之间的主要差异源于读段修剪策略、SNP检测方法和参数以及下游过滤标准的不同。通过将三个分离株与染色体水平的组装进行比对,我们计算了每个流程的特异性和敏感性,发现基于GATK的流程在这些指标之间取得了良好的平衡。修剪方法的选择对基于SAMtools的流程的影响比对使用GATK的流程更大。每个流程推断的系统发育树在进化枝水平上显示出高度一致性,但来自单次疫情的分离株之间的变异性更大,使用更严格截止值的流程分辨率较低。该项目生成了两个用于变异检测常规基准测试的真值数据集,一个是在这35个不同分离株中由10个或更多流程发现的基因型的一致性VCF,另一个是从全基因组比对中鉴定出的2个样本的变异。本研究为评估SNP检测流程和为未来真菌基因组研究制定最佳实践提供了基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b77b/10210944/ee4e75888fe8/mgen-9-979-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验