Grinevich Dmitry, Harden Lyndy, Thakur Siddhartha, Callahan Benjamin J
Department of Population Health and Pathobiology, College of Veterinary Medicine, North Carolina State University, Raleigh, NC, USA.
Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA.
bioRxiv. 2023 Jun 28:2023.06.28.546915. doi: 10.1101/2023.06.28.546915.
The resolution of variation within species is critical for interpreting and acting on many microbial measurements. In the key foodborne pathogens and , the primary sub-species classification scheme used is serotyping: differentiating variants within these species by surface antigen profiles. Serotype prediction from whole-genome sequencing (WGS) of isolates is now seen as comparable or preferable to traditional laboratory methods where WGS is available. However, laboratory and WGS methods depend on an isolation step that is time-consuming and incompletely represents the sample when multiple strains are present. Community sequencing approaches that skip the isolation step are therefore of interest for pathogen surveillance. Here we evaluated the viability of amplicon sequencing of the full-length 16S rRNA gene for serotyping and . We developed a novel algorithm for serotype prediction, implemented as an R package (Seroplacer), which takes as input full-length 16S rRNA gene sequences and outputs serovar predictions after phylogenetic placement into a reference phylogeny. We achieved over 89% accuracy in predicting serotypes on test data, and identified key pathogenic serovars of and in isolate and environmental test samples. Although serotype prediction from 16S sequences is not as accurate as serotype prediction from WGS of isolates, the potential to identify dangerous serovars directly from amplicon sequencing of environmental samples is intriguing for pathogen surveillance. The capabilities developed here are also broadly relevant to other applications where intra-species variation and direct sequencing from environmental samples could be valuable.
解析物种内的变异对于解读许多微生物测量结果并据此采取行动至关重要。在主要的食源性病原体和中,使用的主要亚种分类方案是血清分型:通过表面抗原谱区分这些物种内的变体。从分离株的全基因组测序(WGS)预测血清型,如今在有WGS可用的情况下,被视为与传统实验室方法相当或更优。然而,实验室方法和WGS方法都依赖于一个分离步骤,该步骤耗时且当存在多种菌株时不能完全代表样本。因此,跳过分离步骤的群落测序方法对于病原体监测很有吸引力。在此,我们评估了全长16S rRNA基因扩增子测序用于和血清分型的可行性。我们开发了一种用于血清型预测的新算法,并作为一个R包(Seroplacer)实现,该算法将全长16S rRNA基因序列作为输入,并在系统发育定位到参考系统发育树后输出血清型预测结果。我们在测试数据上预测血清型的准确率超过89%,并在分离株和环境测试样本中鉴定出和的关键致病血清型。虽然从16S序列预测血清型不如从分离株的WGS预测血清型准确,但直接从环境样本的扩增子测序中识别危险血清型的潜力对于病原体监测很有吸引力。这里开发的能力也广泛适用于其他应用,在这些应用中物种内变异和从环境样本直接测序可能很有价值。