高效机器学习方法用于识别食源性疾病暴发和混杂因素。

High-Efficiency Machine Learning Method for Identifying Foodborne Disease Outbreaks and Confounding Factors.

机构信息

Computer Network Information Center, Chinese Academy of Sciences, Beijing, China.

School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China.

出版信息

Foodborne Pathog Dis. 2021 Aug;18(8):590-598. doi: 10.1089/fpd.2020.2913. Epub 2021 Apr 26.

DOI:10.1089/fpd.2020.2913

PMID:33902323

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8390778/

Abstract

The China National Center for Food Safety Risk Assessment (CFSA) uses the Foodborne Disease Monitoring and Reporting System (FDMRS) to monitor outbreaks of foodborne diseases across the country. However, there are problems of underreporting or erroneous reporting in FDMRS, which significantly increase the cost of related epidemic investigations. To solve this problem, we designed a model to identify suspected outbreaks from the data generated by the FDMRS of CFSA. In this study, machine learning models were used to fit the data. The recall rate and F1-score were used as evaluation metrics to compare the classification performance of each model. Feature importance and pathogenic factors were identified and analyzed using tree-based and gradient boosting models. Three real foodborne disease outbreaks were then used to evaluate the best performing model. Furthermore, the SHapley Additive exPlanation value was used to identify the effect of features. Among all machine learning classification models, the eXtreme Gradient Boosting (XGBoost) model achieved the best performance, with the highest recall rate and F1-score of 0.9699 and 0.9582, respectively. In terms of model validation, the model provides a correct judgment of real outbreaks. In the feature importance analysis with the XGBoost model, the health status of the other people with the same exposure has the highest weight, reaching 0.65. The machine learning model built in this study exhibits high accuracy in recognizing foodborne disease outbreaks, thus reducing the manual burden for medical staff. The model helped us identify the confounding factors of foodborne disease outbreaks. Attention should be paid not only to the health status of those with the same exposure but also to the similarity of the cases in time and space.

摘要

中国国家食品安全风险评估中心（CFSA）利用食源性疾病监测和报告系统（FDMRS）监测全国范围内的食源性疾病暴发情况。然而，FDMRS 存在漏报或错报问题，这大大增加了相关疫情调查的成本。为了解决这个问题，我们设计了一个模型，从 CFSA 的 FDMRS 生成的数据中识别疑似暴发。在这项研究中，使用机器学习模型来拟合数据。召回率和 F1 分数被用作评估指标，以比较每个模型的分类性能。使用基于树的和梯度提升模型来识别和分析特征重要性和病原体因素。然后使用三个真实的食源性疾病暴发来评估表现最好的模型。此外，还使用 SHapley Additive exPlanation 值来识别特征的影响。在所有机器学习分类模型中，极端梯度提升（XGBoost）模型的表现最好，召回率和 F1 分数最高，分别为 0.9699 和 0.9582。在模型验证方面，该模型对真实暴发提供了正确的判断。在使用 XGBoost 模型进行的特征重要性分析中，同一暴露人群中其他人的健康状况权重最高，达到 0.65。本研究中构建的机器学习模型在识别食源性疾病暴发方面具有很高的准确性，从而减轻了医务人员的手动负担。该模型帮助我们识别食源性疾病暴发的混杂因素。不仅要关注同一暴露人群的健康状况，还要关注时间和空间上病例的相似性。

相似文献

High-Efficiency Machine Learning Method for Identifying Foodborne Disease Outbreaks and Confounding Factors.

Foodborne Pathog Dis. 2021 Aug;18(8):590-598. doi: 10.1089/fpd.2020.2913. Epub 2021 Apr 26.

Application of Whole-Genome Sequencing in the National Molecular Tracing Network for Foodborne Disease Surveillance in China.

Foodborne Pathog Dis. 2021 Aug;18(8):538-546. doi: 10.1089/fpd.2020.2908. Epub 2021 Jul 30.

Investigations of Possible Multistate Outbreaks of Salmonella, Shiga Toxin-Producing Escherichia coli, and Listeria monocytogenes Infections - United States, 2016.

MMWR Surveill Summ. 2020 Nov 13;69(6):1-14. doi: 10.15585/mmwr.ss6906a1.

Crowdsourcing and machine learning approaches for extracting entities indicating potential foodborne outbreaks from social media.

Sci Rep. 2021 Nov 4;11(1):21678. doi: 10.1038/s41598-021-00766-w.

Surveillance for foodborne disease outbreaks - United States, 1998-2008.

MMWR Surveill Summ. 2013 Jun 28;62(2):1-34.

Foodborne disease in Australia: incidence, notifications and outbreaks. Annual report of the OzFoodNet network, 2002.

Commun Dis Intell Q Rep. 2003;27(2):209-43.

Developing the Community reporting system for foodborne outbreaks.

Euro Surveill. 2008 Nov 6;13(45):pii: 19029.

[Foodborne disease outbreaks in 2006 report of the National Foodborne Disease Surveillance Network, China].

Wei Sheng Yan Jiu. 2010 May;39(3):331-4.

[Foodborne disease outbreaks in China from 1992 to 2001 national foodborne disease surveillance system].

Wei Sheng Yan Jiu. 2004 Nov;33(6):725-7.

Machine learning approach as an early warning system to prevent foodborne Salmonella outbreaks in northwestern Italy.

Vet Res. 2024 Jun 5;55(1):72. doi: 10.1186/s13567-024-01323-9.

引用本文的文献

DODGE: automated point source bacterial outbreak detection using cumulative long term genomic surveillance.

Bioinformatics. 2024 Jul 1;40(7). doi: 10.1093/bioinformatics/btae427.

Machine learning approach as an early warning system to prevent foodborne Salmonella outbreaks in northwestern Italy.

Vet Res. 2024 Jun 5;55(1):72. doi: 10.1186/s13567-024-01323-9.

Antibacterial activities of polyphenols against foodborne pathogens and their application as antibacterial agents.

Food Sci Biotechnol. 2022 Mar 7;31(8):985-997. doi: 10.1007/s10068-022-01058-3. eCollection 2022 Jul.

本文引用的文献

Surveillance for foodborne disease outbreaks in China, 2003 to 2008.

Food Control. 2018 Feb;84:382-388. doi: 10.1016/j.foodcont.2017.08.010. Epub 2017 Aug 14.

Antibiotic Resistance Patterns of Isolates from Retail Foods in Mainland China: A Meta-Analysis.

Foodborne Pathog Dis. 2020 May;17(5):296-307. doi: 10.1089/fpd.2019.2686. Epub 2019 Dec 3.

A Regularization-Based eXtreme Gradient Boosting Approach in Foodborne Disease Trend Forecasting.

Stud Health Technol Inform. 2019 Aug 21;264:930-934. doi: 10.3233/SHTI190360.

Machine-learned epidemiology: real-time detection of foodborne illness at scale.

NPJ Digit Med. 2018 Nov 6;1:36. doi: 10.1038/s41746-018-0045-1. eCollection 2018.

Meta-analysis of Toxoplasma gondii in pigs intended for human consumption in Mainland China.

Acta Trop. 2019 Oct;198:105081. doi: 10.1016/j.actatropica.2019.105081. Epub 2019 Jul 9.

Foodborne Pathogens and Disease Special Issue on the National and International PulseNet Network.

Foodborne Pathog Dis. 2019 Jul;16(7):439-440. doi: 10.1089/fpd.2019.29012.int. Epub 2019 Jun 28.

Worldwide Epidemiology of Serovars in Animal-Based Foods: a Meta-analysis.

Appl Environ Microbiol. 2019 Jul 1;85(14). doi: 10.1128/AEM.00591-19. Print 2019 Jul 15.

Factors that Contribute to Outbreaks of Foodborne Disease.

J Food Prot. 1978 Oct;41(10):816-827. doi: 10.4315/0362-028X-41.10.816.

Epidemiology of foodborne disease outbreaks from 2011 to 2016 in Shandong Province, China.

Medicine (Baltimore). 2018 Nov;97(45):e13142. doi: 10.1097/MD.0000000000013142.

A Bayesian Approach to Real-Time Monitoring and Forecasting of Chinese Foodborne Diseases.

Int J Environ Res Public Health. 2018 Aug 13;15(8):1740. doi: 10.3390/ijerph15081740.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

高效机器学习方法用于识别食源性疾病暴发和混杂因素。

High-Efficiency Machine Learning Method for Identifying Foodborne Disease Outbreaks and Confounding Factors.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献