Tanui Collins K, Benefo Edmund O, Karanth Shraddha, Pradhan Abani K
Department of Nutrition and Food Science, University of Maryland, College Park, MD 20742, USA.
Center for Food Safety and Security Systems, University of Maryland, College Park, MD 20742, USA.
Pathogens. 2022 Jun 16;11(6):691. doi: 10.3390/pathogens11060691.
Despite its low morbidity, listeriosis has a high mortality rate due to the severity of its clinical manifestations. The source of human listeriosis is often unclear. In this study, we investigate the ability of machine learning to predict the food source from which clinical isolates originated. Four machine learning classification algorithms were trained on core genome multilocus sequence typing data of 1212 isolates from various food sources. The average accuracies of random forest, support vector machine radial kernel, stochastic gradient boosting, and logit boost were found to be 0.72, 0.61, 0.7, and 0.73, respectively. Logit boost showed the best performance and was used in model testing on 154 clinical isolates. The model attributed 17.5 % of human clinical cases to dairy, 32.5% to fruits, 14.3% to leafy greens, 9.7% to meat, 4.6% to poultry, and 18.8% to vegetables. The final model also provided us with genetic features that were predictive of specific sources. Thus, this combination of genomic data and machine learning-based models can greatly enhance our ability to track from different food sources.
尽管李斯特菌病的发病率较低,但由于其临床表现严重,死亡率很高。人类李斯特菌病的源头往往不明。在本研究中,我们调查了机器学习预测临床分离株来源食物种类的能力。我们使用来自各种食物来源的1212株分离株的核心基因组多位点序列分型数据,对四种机器学习分类算法进行了训练。结果发现,随机森林、支持向量机径向核、随机梯度提升和逻辑斯蒂提升的平均准确率分别为0.72、0.61、0.7和0.73。逻辑斯蒂提升表现最佳,并用于对154株临床分离株进行模型测试。该模型将17.5%的人类临床病例归因于乳制品,32.5%归因于水果,14.3%归因于绿叶蔬菜,9.7%归因于肉类,4.6%归因于家禽,18.8%归因于蔬菜。最终模型还为我们提供了可预测特定来源的基因特征。因此,这种基因组数据与基于机器学习的模型相结合的方法,能够大大增强我们追踪不同食物来源的能力。