基于堆叠集成算法的非致命溺水风险预测

Non-Fatal Drowning Risk Prediction Based on Stacking Ensemble Algorithm.

作者信息

Xie Xinshan, Li Zhixing, Xu Haofeng, Peng Dandan, Yin Lihua, Meng Ruilin, Wu Wei, Ma Wenjun, Chen Qingsong

机构信息

School of Public Health, Guangdong Pharmaceutical University, Guangzhou 510200, China.

Guangdong Provincial Institute of Public Health, Guangdong Provincial Center for Disease Control and Prevention, Guangzhou 511430, China.

出版信息

Children (Basel). 2022 Sep 14;9(9):1383. doi: 10.3390/children9091383.

DOI:10.3390/children9091383

PMID:36138692

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9498184/

Abstract

Drowning is a major public health problem and a leading cause of death in children living in developing countries. We seek better machine learning (ML) algorithms to provide a novel risk-assessment insight on non-fatal drowning prediction. The data on non-fatal drowning were collected in Qingyuan city, Guangdong Province, China. We developed four ML models to predict the non-fatal drowning risk, including a logistic regression model (LR), random forest model (RF), support vector machine model (SVM), and stacking-based model, on three primary learners (LR, RF, SVM). The area under the curve (AUC), F1 value, accuracy, sensitivity, and specificity were calculated to evaluate the predictive ability of the different learning algorithms. This study included a total of 8390 children. Of those, 12.07% (1013) had experienced non-fatal drowning. We found the following risk factors are closely associated with the risk of non-fatal drowning: the frequency of swimming in open water, distance between the school and the surrounding open waters, swimming skills, personality (introvert) and relationality with family members. Compared to the other three base models, the stacking generalization model achieved a superior performance in the non-fatal drowning dataset (AUC = 0.741, sensitivity = 0.625, F1 value = 0.359, accuracy = 0.739 and specificity = 0.754). This study indicates that applying stacking ensemble algorithms in the non-fatal drowning dataset may outperform other ML models.

摘要

溺水是一个重大的公共卫生问题，也是发展中国家儿童死亡的主要原因。我们寻求更好的机器学习（ML）算法，以提供关于非致命溺水预测的新颖风险评估见解。非致命溺水数据收集于中国广东省清远市。我们开发了四个ML模型来预测非致命溺水风险，包括逻辑回归模型（LR）、随机森林模型（RF）、支持向量机模型（SVM）以及基于三种主要学习器（LR、RF、SVM）的堆叠模型。计算曲线下面积（AUC）、F1值、准确率、灵敏度和特异性，以评估不同学习算法的预测能力。本研究共纳入8390名儿童。其中，12.07%（1013名）曾经历过非致命溺水。我们发现以下风险因素与非致命溺水风险密切相关：在开放水域游泳的频率、学校与周边开放水域的距离、游泳技能、性格（内向）以及与家庭成员的关系。与其他三个基础模型相比，堆叠泛化模型在非致命溺水数据集中表现更优（AUC = 0.741，灵敏度 = 0.625，F1值 = 0.359，准确率 = 0.739，特异性 = 0.754）。本研究表明，在非致命溺水数据集中应用堆叠集成算法可能优于其他ML模型。