Radomski Nicolas, Cadel-Six Sabrina, Cherchame Emeline, Felten Arnaud, Barbet Pauline, Palma Federica, Mallet Ludovic, Le Hello Simon, Weill François-Xavier, Guillier Laurent, Mistou Michel-Yves
ANSES, Laboratory for Food Safety, Université PARIS-EST, Maisons-Alfort, France.
Unité des Bactéries Pathogènes Entériques, Institut Pasteur, Centre National de Référence des Salmonella, Paris, France.
Front Microbiol. 2019 Oct 24;10:2413. doi: 10.3389/fmicb.2019.02413. eCollection 2019.
The investigation of foodborne outbreaks (FBOs) from genomic data typically relies on inspecting the relatedness of samples through a phylogenomic tree computed on either SNPs, genes, kmers, or alleles (i.e., cgMLST and wgMLST). The phylogenomic reconstruction is often time-consuming, computation-intensive and depends on hidden assumptions, pipelines implementation and their parameterization. In the context of FBO investigations, robust links between isolates are required in a timely manner to trigger appropriate management actions. Here, we propose a non-parametric statistical method to assert the relatedness of samples (i.e., outbreak cases) or whether to reject them (i.e., non-outbreak cases). With typical computation running within minutes on a desktop computer, we benchmarked the ability of three non-parametric statistical tests (i.e., Wilcoxon rank-sum, Kolmogorov-Smirnov and Kruskal-Wallis) on six different genomic features (i.e., SNPs, SNPs excluding recombination events, genes, kmers, cgMLST alleles, and wgMLST alleles) to discriminate outbreak cases (i.e., positive control: C+) from non-outbreak cases (i.e., negative control: C-). We leveraged four well-characterized and retrospectively investigated FBOs of Typhimurium and its monophasic variant . 1,4,[5],12:i:- from France, setting positive and negative controls in all the assays. We show that the approaches relying on pairwise SNP differences distinguished all four considered outbreaks in contrast to the other tested genomic features (i.e., genes, kmers, cgMLST alleles, and wgMLST alleles). The freely available non-parametric method written in R has been designed to be independent of both the phylogenomic reconstruction and the detection methods of genomic features (i.e., SNPs, genes, kmers, or alleles), making it widely and easily usable to anybody working on genomic data from suspected samples.
基于基因组数据对食源性疾病暴发(FBOs)进行调查通常依赖于通过基于单核苷酸多态性(SNPs)、基因、k-mer或等位基因(即核心多位点序列分型(cgMLST)和全基因组多位点序列分型(wgMLST))计算的系统发育树来检查样本的相关性。系统发育重建通常耗时、计算量大,并且依赖于隐藏假设、流程实施及其参数设置。在FBO调查的背景下,需要及时建立分离株之间的可靠联系,以触发适当的管理行动。在此,我们提出一种非参数统计方法来确定样本(即暴发病例)之间的相关性,或者判断是否应排除这些样本(即非暴发病例)。在台式计算机上,典型计算只需几分钟即可完成,我们对三种非参数统计检验(即威尔科克森秩和检验、柯尔莫哥洛夫-斯米尔诺夫检验和克鲁斯卡尔-沃利斯检验)在六种不同基因组特征(即SNPs、排除重组事件的SNPs、基因、k-mers、cgMLST等位基因和wgMLST等位基因)上区分暴发病例(即阳性对照:C+)和非暴发病例(即阴性对照:C-)的能力进行了基准测试。我们利用了来自法国的4例特征明确且经过回顾性调查的鼠伤寒沙门氏菌及其单相变体1,4,[5],12:i:-的FBOs,在所有检测中设置了阳性和阴性对照。我们发现,与其他测试的基因组特征(即基因、k-mers、cgMLST等位基因和wgMLST等位基因)相比,基于成对SNP差异的方法能够区分所有4例考虑到的暴发。用R语言编写的免费非参数方法被设计为独立于系统发育重建和基因组特征(即SNPs、基因、k-mers或等位基因)的检测方法,使得任何处理疑似样本基因组数据的人员都能广泛且轻松地使用它。