Chung Taejung, Yan Runan, Weller Daniel L, Kovac Jasna
Department of Food Science, The Pennsylvania State University, University Park, Pennsylvania, USA.
Microbiome Center, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania, USA.
Microbiol Spectr. 2023 Mar 22;11(2):e0038123. doi: 10.1128/spectrum.00381-23.
The use of water contaminated with Salmonella for produce production contributes to foodborne disease burden. To reduce human health risks, there is a need for novel, targeted approaches for assessing the pathogen status of agricultural water. We investigated the utility of water microbiome data for predicting Salmonella contamination of streams used to source water for produce production. Grab samples were collected from 60 New York streams in 2018 and tested for Salmonella. Separately, DNA was extracted from the samples and used for Illumina shotgun metagenomic sequencing. Reads were trimmed and used to assign taxonomy with Kraken2. Conditional forest (CF), regularized random forest (RRF), and support vector machine (SVM) models were implemented to predict Salmonella contamination. Model performance was assessed using 10-fold cross-validation repeated 10 times to quantify area under the curve (AUC) and Kappa score. CF models outperformed the other two algorithms based on AUC (0.86, CF; 0.81, RRF; 0.65, SVM) and Kappa score (0.53, CF; 0.41, RRF; 0.12, SVM). The taxa that were most informative for accurately predicting Salmonella contamination based on CF were compared to taxa identified by ALDEx2 as being differentially abundant between Salmonella-positive and -negative samples. CF and differential abundance tests both identified Aeromonas salmonicida (variable importance [VI] = 0.012) and sp. strain CA23 (VI = 0.025) as the two most informative taxa for predicting Salmonella contamination. Our findings suggest that microbiome-based models may provide an alternative to or complement existing water monitoring strategies. Similarly, the informative taxa identified in this study warrant further investigation as potential indicators of Salmonella contamination of agricultural water. Understanding the associations between surface water microbiome composition and the presence of foodborne pathogens, such as Salmonella, can facilitate the identification of novel indicators of Salmonella contamination. This study assessed the utility of microbiome data and three machine learning algorithms for predicting Salmonella contamination of Northeastern streams. The research reported here both expanded the knowledge on the microbiome composition of surface waters and identified putative novel indicators (i.e., species) for Salmonella in Northeastern streams. These putative indicators warrant further research to assess whether they are consistent indicators of Salmonella contamination across regions, waterways, and years not represented in the data set used in this study. Validated indicators identified using microbiome data may be used as targets in the development of rapid (e.g., PCR-based) detection assays for the assessment of microbial safety of agricultural surface waters.
使用被沙门氏菌污染的水进行农产品生产会加重食源性疾病负担。为降低对人类健康的风险,需要采用新颖、有针对性的方法来评估农业用水中的病原体状况。我们研究了水微生物组数据在预测用于农产品生产水源的溪流中沙门氏菌污染情况的效用。2018年从纽约的60条溪流中采集了抓取样本并检测沙门氏菌。另外,从样本中提取DNA并用于Illumina鸟枪法宏基因组测序。对读取的数据进行修剪,并使用Kraken2进行分类学归类。实施条件森林(CF)、正则化随机森林(RRF)和支持向量机(SVM)模型来预测沙门氏菌污染情况。使用10折交叉验证重复10次来评估模型性能,以量化曲线下面积(AUC)和卡帕分数。基于AUC(CF为0.86;RRF为0.81;SVM为0.65)和卡帕分数(CF为0.53;RRF为0.41;SVM为0.12),CF模型的表现优于其他两种算法。将基于CF能最准确预测沙门氏菌污染的分类群与通过ALDEx2鉴定出的在沙门氏菌阳性和阴性样本之间差异丰富的分类群进行比较。CF和差异丰度测试均将杀鲑气单胞菌(可变重要性[VI] = 0.012)和sp.菌株CA23(VI = 0.025)确定为预测沙门氏菌污染的两个最具信息性的分类群。我们的研究结果表明,基于微生物组的模型可能为现有水监测策略提供替代方案或补充。同样,本研究中鉴定出的信息丰富的分类群作为农业用水中沙门氏菌污染的潜在指标值得进一步研究。了解地表水微生物组组成与食源性病原体(如沙门氏菌)存在之间的关联,有助于识别沙门氏菌污染的新指标。本研究评估了微生物组数据和三种机器学习算法在预测东北部溪流中沙门氏菌污染情况的效用。此处报道的研究既扩展了关于地表水微生物组组成的知识,又识别出了东北部溪流中沙门氏菌的假定新指标(即物种)。这些假定指标值得进一步研究,以评估它们是否是本研究中使用的数据集中未涵盖的跨区域、水道和年份的沙门氏菌污染的一致指标。使用微生物组数据鉴定出的经过验证的指标可作为开发快速(如基于PCR的)检测方法的目标,用于评估农业地表水的微生物安全性。