Bureau of Microbial Hazards, Food Directorate, Health Canada Ottawa, ON, Canada.
Front Cell Infect Microbiol. 2012 May 1;2:57. doi: 10.3389/fcimb.2012.00057. eCollection 2012.
Tracking of sources of sporadic cases of campylobacteriosis remains challenging, as commonly used molecular typing methods have limited ability to unambiguously link genetically related strains. Genomics has become increasingly prominent in the public health response to enteric pathogens as methods enable characterization of pathogens at an unprecedented level of resolution. However, the cost of sequencing and expertise required for bioinformatic analyses remains prohibitive, and these comprehensive analyses are limited to a few priority strains. Although several molecular typing methods are currently widely used for epidemiological analysis of campylobacters, it is not clear how accurately these methods reflect true strain relationships. To address this, we have developed a framework and associated computational tools to rapidly analyze draft genome sequence data for the assessment of molecular typing methods against a "gold standard" based on the phylogenetic analysis of highly conserved core (HCC) genes with high sequence quality. We analyzed 104 publicly available whole genome sequences (WGS) of C. jejuni and C. coli. In addition to in silico determination of multi-locus sequence typing (MLST), flaA, and porA type, as well as comparative genomic fingerprinting (CGF) type, we inferred a "reference" phylogeny based on 389 HCC genes. Molecular typing data were compared to the reference phylogeny for concordance using the adjusted Wallace coefficient (AWC) with confidence intervals. Although MLST targets the sequence variability in core genes and CGF targets insertions/deletions of accessory genes, both methods are based on multi-locus analysis and provided better estimates of true phylogeny than methods based on single loci (porA, flaA). A more comprehensive WGS dataset including additional genetically related strains, both epidemiologically linked and unlinked, will be necessary to more comprehensively assess the performance of subtyping methods for outbreak investigations and surveillance activities. Analyses of the strengths and weaknesses of widely used typing methodologies in inferring true strain relationships will provide guidance in the interpretation of this data for epidemiological purposes.
追踪散发性弯曲菌病病例的来源仍然具有挑战性,因为常用的分子分型方法无法明确地将遗传上相关的菌株联系起来。基因组学在肠道病原体的公共卫生应对中变得越来越突出,因为这些方法能够以前所未有的分辨率对病原体进行特征描述。然而,测序的成本和生物信息学分析所需的专业知识仍然令人望而却步,这些全面的分析仅限于少数优先菌株。尽管目前有几种分子分型方法广泛用于弯曲菌的流行病学分析,但这些方法在多大程度上准确反映真实的菌株关系尚不清楚。为了解决这个问题,我们开发了一个框架和相关的计算工具,以快速分析草案基因组序列数据,根据高质量的高度保守核心(HCC)基因的系统发育分析对基于“黄金标准”的分子分型方法进行评估。我们分析了 104 个公开可用的空肠弯曲菌和结肠弯曲菌全基因组序列(WGS)。除了在模拟确定多位点序列分型(MLST)、flaA 和 porA 型以及比较基因组指纹图谱(CGF)型之外,我们还根据 389 个 HCC 基因推断了“参考”系统发育。使用调整后的华莱士系数(AWC)和置信区间,将分子分型数据与参考系统发育进行比较,以确定一致性。尽管 MLST 针对核心基因序列变异性,CGF 针对辅助基因的插入/缺失,但这两种方法都是基于多位点分析的,比基于单个基因(porA、flaA)的方法能更好地估计真实的系统发育。一个更全面的 WGS 数据集,包括更多遗传上相关的菌株,包括流行病学上有联系和无联系的菌株,将有必要更全面地评估用于暴发调查和监测活动的亚型方法的性能。分析广泛使用的分型方法在推断真实菌株关系方面的优缺点,将为解释这些数据以用于流行病学目的提供指导。