Microbiological Diagnostic Unit, Public Health Laboratory, University of Melbourne, Melbourne, VIC, Australia; Department of Microbiology & Immunology, Peter Doherty Institute for Infection & Immunity, University of Melbourne, Melbourne, VIC, Australia.
Microbiological Diagnostic Unit, Public Health Laboratory, University of Melbourne, Melbourne, VIC, Australia.
Lancet Microbe. 2021 Nov;2(11):e575-e583. doi: 10.1016/S2666-5247(21)00149-X. Epub 2021 Aug 6.
Pairwise single nucleotide polymorphisms (SNPs) are a cornerstone of genomic approaches to the inference of transmission of multidrug-resistant (MDR) organisms in hospitals. However, the impact of many key analytical approaches on these inferences has not yet been systematically assessed. This study aims to make such a systematic assessment.
We conducted a 15-month prospective study (2-month pilot phase, 13-month implementation phase), across four hospital networks including eight hospitals in Melbourne, VIC, Australia. Patient clinical and screening samples containing one or more isolates of meticillin-resistant Staphylococcus aureus, vancomycin-resistant Enterococcus faecium, and extended-spectrum β-lactamase-producing Escherichia coli and Klebsiella pneumoniae were collected and underwent whole genome sequencing. Using the genome data from the top four most numerous sequence types from each species, 16 in total, we systematically assessed the: (1) impact of sample and reference genome diversity through multiple core genome alignments using different data subsets and reference genomes, (2) effect of masking of prophage and regions of recombination in the core genome alignments by assessing SNP distances before and after masking, (3) differences between a cumulative versus a 3-month sliding-window approach to sample genome inclusion in the dataset over time, and (4) the comparative effects each of these approaches had when applying a previously defined SNP threshold for inferring likely transmission.
2275 samples were collected (397 during the pilot phase from April 4 to June 18, 2017; 1878 during the implementation phase from Oct 30, 2017, to Nov 30, 2018) from 1870 patients. Of these 2275 samples, 1537 were identified as arising from the four most numerous sequence types from each of the four target species of MDR organisms in this dataset (16 sequence types in total: S aureus ST5, ST22, ST45, and ST93; E faecium ST80, ST203, ST1421, and ST1424; K pneumoniae ST15, ST17, ST307, and ST323; and E coli ST38, ST131, ST648, and ST1193). Across the species, using a reference genome of the same sequence type provided a greater degree of pairwise SNP resolution, compared with species and outgroup-reference alignments that mostly resulted in inflated SNP distances and the possibility of missed transmission events. Omitting prophage regions had minimal effect; however, omitting recombination regions had a highly variable effect, often inflating the number of closely related pairs. Estimated SNP distances between isolate pairs over time were more consistent using a sliding-window than a cumulative approach.
We propose that the use of a closely related reference genome, without masking of prophage or recombination regions, and of a sliding-window approach for isolate inclusion is best for accurate and consistent MDR organism transmission inference, when using core genome alignments and SNP thresholds. These approaches provide increased stability and resolution, so SNP thresholds can be more reliably applied for putative transmission inference among diverse MDR organisms, reducing the chance of incorrectly inferring the presence or absence of close genetic relatedness and, therefore, transmission. The establishment of a broadly applicable and standardised approach, as proposed here, is necessary to implement widespread prospective genomic surveillance for MDR organism transmission.
Melbourne Genomics Health Alliance, and National Health and Medical Research Council of Australia.
成对的单核苷酸多态性 (SNP) 是医院推断多药耐药 (MDR) 生物体传播的基因组方法的基石。然而,许多关键分析方法对这些推论的影响尚未得到系统评估。本研究旨在进行这样的系统评估。
我们进行了一项为期 15 个月的前瞻性研究(2 个月的试点阶段,13 个月的实施阶段),跨越包括澳大利亚墨尔本、VIC 在内的四个医院网络的 8 家医院。收集了含有耐甲氧西林金黄色葡萄球菌、万古霉素耐药肠球菌粪肠球菌和产超广谱β-内酰胺酶的大肠埃希菌和肺炎克雷伯菌的一种或多种分离株的患者临床和筛查样本,并进行了全基因组测序。使用来自每个物种的前四种最多数量序列类型(共 16 种)的基因组数据,我们系统地评估了:(1)通过使用不同的数据集和参考基因组进行多个核心基因组比对来评估样本和参考基因组多样性的影响,(2)评估在掩蔽核心基因组比对中的噬菌体和重组区域之前和之后 SNP 距离的影响,(3)随着时间的推移,样本基因组在数据集中的累积与 3 个月滑动窗口方法之间的差异,以及(4)在应用之前定义的 SNP 阈值推断可能的传播时,每种方法的比较效果。
从 1870 名患者中收集了 2275 个样本(4 月 4 日至 6 月 18 日的试点阶段采集了 397 个样本;10 月 30 日至 11 月 30 日的实施阶段采集了 1878 个样本)。在这 2275 个样本中,有 1537 个被确定为来自该数据集的四个目标 MDR 生物体物种中每一个最常见序列类型的四个最常见序列类型(共 16 个序列类型:金黄色葡萄球菌 ST5、ST22、ST45 和 ST93;粪肠球菌 ST80、ST203、ST1421 和 ST1424;肺炎克雷伯菌 ST15、ST17、ST307 和 ST323;和大肠埃希菌 ST38、ST131、ST648 和 ST1193)。在这些物种中,使用相同序列类型的参考基因组提供了更大程度的成对 SNP 分辨率,与主要导致 SNP 距离膨胀和可能错过传播事件的物种和外群参考比对相比。忽略噬菌体区域的影响最小;然而,忽略重组区域的影响非常大,通常会增加密切相关对的数量。与累积方法相比,使用滑动窗口时,菌株对之间估计的 SNP 距离更一致。
我们建议,在使用核心基因组比对和 SNP 阈值时,最好使用密切相关的参考基因组,不掩蔽噬菌体或重组区域,并使用滑动窗口方法来包括分离株,以进行准确和一致的 MDR 生物体传播推断。这些方法提供了更高的稳定性和分辨率,因此 SNP 阈值可以更可靠地应用于不同 MDR 生物体之间的假定传播推断,从而降低错误推断密切遗传相关性和因此传播的可能性。如这里所建议的,需要建立一种广泛适用和标准化的方法,以实施广泛的前瞻性基因组监测 MDR 生物体的传播。
墨尔本基因组健康联盟和澳大利亚国家卫生和医学研究委员会。