Wang Hanxue, Cui Wenjuan, Guo Yunchang, Du Yi, Zhou Yuanchun
Computer Network Information Center, Chinese Academy of Sciences, Beijing, China.
Chinese Academy of Sciences University, Beijing, China.
JMIR Med Inform. 2021 Jan 26;9(1):e24924. doi: 10.2196/24924.
Foodborne diseases, as a type of disease with a high global incidence, place a heavy burden on public health and social economy. Foodborne pathogens, as the main factor of foodborne diseases, play an important role in the treatment and prevention of foodborne diseases; however, foodborne diseases caused by different pathogens lack specificity in clinical features, and there is a low proportion of clinically actual pathogen detection in real life.
We aimed to analyze foodborne disease case data, select appropriate features based on analysis results, and use machine learning methods to classify foodborne disease pathogens to predict foodborne disease pathogens that have not been tested.
We extracted features such as space, time, and exposed food from foodborne disease case data and analyzed the relationship between these features and the foodborne disease pathogens using a variety of machine learning methods to classify foodborne disease pathogens. We compared the results of 4 models to obtain the pathogen prediction model with the highest accuracy.
The gradient boost decision tree model obtained the highest accuracy, with accuracy approaching 69% in identifying 4 pathogens including Salmonella, Norovirus, Escherichia coli, and Vibrio parahaemolyticus. By evaluating the importance of features such as time of illness, geographical longitude and latitude, and diarrhea frequency, we found that they play important roles in classifying the foodborne disease pathogens.
Data analysis can reflect the distribution of some features of foodborne diseases and the relationship among the features. The classification of pathogens based on the analysis results and machine learning methods can provide beneficial support for clinical auxiliary diagnosis and treatment of foodborne diseases.
食源性疾病作为全球发病率较高的一类疾病,给公共卫生和社会经济带来了沉重负担。食源性病原体作为食源性疾病的主要因素,在食源性疾病的治疗和预防中发挥着重要作用;然而,由不同病原体引起的食源性疾病在临床特征上缺乏特异性,现实生活中临床实际病原体检测比例较低。
分析食源性疾病病例数据,根据分析结果选择合适的特征,并使用机器学习方法对食源性疾病病原体进行分类,以预测未检测的食源性疾病病原体。
从食源性疾病病例数据中提取空间、时间和暴露食物等特征,并使用多种机器学习方法分析这些特征与食源性疾病病原体之间的关系,以对食源性疾病病原体进行分类。我们比较了4种模型的结果,以获得准确率最高的病原体预测模型。
梯度提升决策树模型获得了最高的准确率,在识别包括沙门氏菌、诺如病毒、大肠杆菌和副溶血性弧菌在内的4种病原体时,准确率接近69%。通过评估发病时间、地理经度和纬度以及腹泻频率等特征的重要性,我们发现它们在食源性疾病病原体分类中发挥着重要作用。
数据分析可以反映食源性疾病的一些特征分布以及特征之间的关系。基于分析结果和机器学习方法对病原体进行分类,可以为食源性疾病的临床辅助诊断和治疗提供有益支持。