Pightling Arthur W, Petronella Nicholas, Pagotto Franco
Office of Analytics and Outreach, Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, 5100 Paint Branch Parkway, College Park, MD, 20740, USA.
Biostatistics and Modelling Division, Bureau of Food Surveillance and Science Integration, Food Directorate, Health Products and Food Branch, Health Canada, 251 Sir Frederick Banting Driveway, Ottawa, K1A 0K9, ON, Canada.
BMC Microbiol. 2015 Oct 22;15:224. doi: 10.1186/s12866-015-0526-1.
Next-generation sequencing provides a powerful means of molecular characterization. However, methods such as single-nucleotide polymorphism detection or whole-chromosome sequence analysis are computationally expensive, prone to errors, and are still less accessible than traditional typing methods. Here, we present the Listeria monocytogenes core-genome sequence typing method for molecular characterization. This method uses a high-confidence core (HCC) genome, calculated to ensure accurate identification of orthologs. We also developed an evolutionarily relevant nomenclature based upon phylogenetic analysis of HCC genomes. Finally, we created a pipeline (LmCGST; https://sourceforge.net/projects/lmcgst/files/) that takes in raw next-generation sequencing reads, calculates a subject HCC profile, compares it to an expandable database, assigns a sequence type, and performs a phylogenetic analysis.
We analyzed 29 high-quality, closed Listeria monocytogenes chromosome sequences and identified loci that are reliable targets for automated molecular characterization methods. We identified 1013 open-reading frames that comprise our high-confidence core (HCC) genome. We then populated a database with HCC profiles from 114 taxa. We sequenced 84 randomly selected isolates from the Listeriosis Reference Service for Canada's collection and analysed them with the LmCGST pipeline. In addition, we generated pulsed-field gel electrophoresis, ribotyping, and in silico multi-locus sequence typing (MLST) data for the 84 isolates and compared the results to those obtained using the CGST method. We found that all of the methods yielded results that are generally congruent. However, due to the increased numbers of categories, the CGST method provides much greater discriminatory power than the other methods tested here.
We show that the CGST method provides increased discriminatory power relative to typing methods such as pulsed-field gel electrophoresis, ribotyping, and multi-locus sequence typing while it addresses several shortcomings of other methods of molecular characterization with next-generation sequence data. It uses discrete, well-defined groupings (types) of organisms that are phylogenetically relevant and easily interpreted. In addition, the CGST scheme can be expanded to include additional loci and HCC profiles in the future. In total, the CGST method provides an approach to the molecular characterization of Listeria monocytogenes with next-generation sequence data that is highly reproducible, easily standardized, portable, and accessible.
新一代测序提供了一种强大的分子特征分析手段。然而,诸如单核苷酸多态性检测或全染色体序列分析等方法计算成本高昂,容易出错,并且与传统分型方法相比,其普及程度仍然较低。在此,我们提出了用于分子特征分析的单核细胞增生李斯特菌核心基因组序列分型方法。该方法使用经计算以确保准确鉴定直系同源基因的高可信度核心(HCC)基因组。我们还基于HCC基因组的系统发育分析开发了一种具有进化相关性的命名法。最后,我们创建了一个流程(LmCGST;https://sourceforge.net/projects/lmcgst/files/),该流程接收原始的新一代测序读数,计算样本的HCC图谱,将其与一个可扩展数据库进行比较,指定一个序列类型,并进行系统发育分析。
我们分析了29个高质量的、封闭的单核细胞增生李斯特菌染色体序列,并确定了可作为自动化分子特征分析方法可靠靶点的基因座。我们鉴定出1013个开放阅读框,它们构成了我们的高可信度核心(HCC)基因组。然后,我们用来自114个分类单元的HCC图谱填充了一个数据库。我们对从加拿大李斯特菌病参考服务中心随机选择的84株分离株进行测序,并使用LmCGST流程对其进行分析。此外,我们为这84株分离株生成了脉冲场凝胶电泳、核糖体分型和计算机多位点序列分型(MLST)数据,并将结果与使用CGST方法获得的结果进行比较。我们发现所有这些方法得出的结果总体上是一致的。然而,由于分类数量的增加,CGST方法比此处测试的其他方法具有更强的鉴别力。
我们表明,与脉冲场凝胶电泳、核糖体分型和多位点序列分型等分型方法相比,CGST方法具有更强的鉴别力,同时它解决了其他分子特征分析方法在处理新一代序列数据时的几个缺点。它使用与系统发育相关且易于解释的离散、明确的生物体分组(类型)。此外,CGST方案未来可以扩展以纳入更多的基因座和HCC图谱。总体而言,CGST方法提供了一种利用新一代序列数据对单核细胞增生李斯特菌进行分子特征分析的方法,该方法具有高度可重复性、易于标准化、便携且易于获取。