Suppr超能文献

基因组多样性影响细菌单核苷酸多态性 calling 管道的准确性。

Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism-calling pipelines.

机构信息

Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK.

National Institute for Health Research Health Research Protection Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK.

出版信息

Gigascience. 2020 Feb 1;9(2). doi: 10.1093/gigascience/giaa007.

Abstract

BACKGROUND

Accurately identifying single-nucleotide polymorphisms (SNPs) from bacterial sequencing data is an essential requirement for using genomics to track transmission and predict important phenotypes such as antimicrobial resistance. However, most previous performance evaluations of SNP calling have been restricted to eukaryotic (human) data. Additionally, bacterial SNP calling requires choosing an appropriate reference genome to align reads to, which, together with the bioinformatic pipeline, affects the accuracy and completeness of a set of SNP calls obtained. This study evaluates the performance of 209 SNP-calling pipelines using a combination of simulated data from 254 strains of 10 clinically common bacteria and real data from environmentally sourced and genomically diverse isolates within the genera Citrobacter, Enterobacter, Escherichia, and Klebsiella.

RESULTS

We evaluated the performance of 209 SNP-calling pipelines, aligning reads to genomes of the same or a divergent strain. Irrespective of pipeline, a principal determinant of reliable SNP calling was reference genome selection. Across multiple taxa, there was a strong inverse relationship between pipeline sensitivity and precision, and the Mash distance (a proxy for average nucleotide divergence) between reads and reference genome. The effect was especially pronounced for diverse, recombinogenic bacteria such as Escherichia coli but less dominant for clonal species such as Mycobacterium tuberculosis.

CONCLUSIONS

The accuracy of SNP calling for a given species is compromised by increasing intra-species diversity. When reads were aligned to the same genome from which they were sequenced, among the highest-performing pipelines was Novoalign/GATK. By contrast, when reads were aligned to particularly divergent genomes, the highest-performing pipelines often used the aligners NextGenMap or SMALT, and/or the variant callers LoFreq, mpileup, or Strelka.

摘要

背景

准确识别细菌测序数据中的单核苷酸多态性(SNP)是利用基因组学跟踪传播并预测重要表型(如抗药性)的必要条件。然而,以前大多数 SNP 调用的性能评估都仅限于真核(人类)数据。此外,细菌 SNP 调用需要选择一个合适的参考基因组来对齐读取,这与生物信息学管道一起,会影响获得的一组 SNP 调用的准确性和完整性。本研究使用来自 10 种临床常见细菌的 254 株模拟数据和环境来源的、基因组多样化的柠檬酸杆菌属、肠杆菌属、大肠杆菌属和克雷伯菌属的分离物的真实数据,结合 209 个 SNP 调用管道进行了性能评估。

结果

我们评估了 209 个 SNP 调用管道的性能,将读取与同一菌株或不同菌株的基因组对齐。无论使用哪种管道,可靠的 SNP 调用的主要决定因素都是参考基因组的选择。在多个分类群中,管道的敏感性和精度之间存在强烈的反比关系,并且与参考基因组的 Mash 距离(核苷酸差异的代理)之间存在强烈的反比关系。这种影响在大肠杆菌等多样化、重组性强的细菌中尤为明显,但在结核分枝杆菌等克隆种中则不太明显。

结论

给定物种的 SNP 调用准确性受到种内多样性增加的影响。当读取与它们测序的同一基因组对齐时,表现最佳的管道之一是 Novoalign/GATK。相比之下,当读取与特别不同的基因组对齐时,表现最佳的管道通常使用对齐器 NextGenMap 或 SMALT,以及/或变体调用器 LoFreq、mpileup 或 Strelka。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/35a0/7002876/b46f9caefe30/giaa007fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验