Suppr超能文献

在孟加拉国,利用机器学习技术探究与五岁以下儿童死亡相关的因素,并评估儿童死亡风险。

Explore the factors related to the death of offspring under age five and appraise the hazard of child mortality using machine learning techniques in Bangladesh.

作者信息

Rahman Ashikur, Rahman Md Habibur

机构信息

Department of Statistics and Data Science, Jahangirnagar University, Dhaka, 1342, Bangladesh.

出版信息

BMC Public Health. 2025 Jan 29;25(1):360. doi: 10.1186/s12889-025-21460-w.

Abstract

BACKGROUND

Child mortality is a reliable and significant indicator of a nation's health. Although the child mortality rate in Bangladesh is declining over time, it still needs to drop even more in order to meet the Sustainable Development Goals (SDGs). Machine Learning models are one of the best tools for making more accurate and efficient forecasts and gaining in-depth knowledge. A deeper understanding is crucial for significantly reducing child mortality rates. Accurate predictions using machine learning models can empower authorities to implement timely interventions and raise awareness. So, the study aimed to explore the factors related to child mortality and assess the efficacy of various machine-learning models in predicting child mortality in Bangladesh.

METHODS AND MATERIALS

About Forty-two thousand observations, except the missing observations, were extracted for this study from the Bangladesh Demographic and Health Survey (BDHS) data conducted in 2017-18. The survey utilized a two-stage stratified sampling method, selecting 675 enumeration areas-250 in urban settings and 425 in rural areas-resulting in effective data collection from 672 clusters and 20160 households. The Chi-square test and recursive feature elimination (RFE) are used to find the relevant risk factors of child mortality among the number of factors. Six ML-based algorithms were implemented for predicting child mortality, such as Naïve Bayes, Classification and Regression Trees, Random Forest, C5.0 Classification, Gradient Boosting Machine, and Logistic Regression. Model evaluation metrics like accuracy, specificity, sensitivity, negative predictive value, score, positive predictive value, k-fold cross-validation, and area under the curve (AUC) techniques were used to evaluate the performance of the models.

RESULTS AND DISCUSSION

The child mortality rate is 8.2%, according to the data. The bivariate analysis showed that the child mortality rate was higher among the children whose mothers were uneducated, impoverished, underweight, aged 35-49, and gave birth before age 20. Families' water sources and religious connections had no statistically significant impact on child mortality. The prediction of child mortality using machine learning models is the main objective of this study. None of the machine learning models correctly classified dead occurrences. Therefore, this study conducted over-sampling and under-sampling analysis. Approximately 76727 and 6910 observations were sampled for over-sampling and under-sampling techniques, respectively. According to the findings of the over-sampling data, the Random Forest outperformed all the other models in terms of total performance based on training and testing sets, with an accuracy of seventy percent. The k-fold cross-validation approach demonstrated the Random Forest model's superior performance, and achieved the highest AUC (0.701). On the other hand, the Gradient Boosting Machine has the highest assessment for predicting child mortality in under-sampling analysis. The k-fold cross-validation also illustrated the better performance of the Gradient Boosting Machine.

CONCLUSION

The Gradient Boosting Machine and Random Forest produce the best predictive power for classifying child mortality and may help to ameliorate policy decision-making in this regard.

摘要

背景

儿童死亡率是一个国家健康状况的可靠且重要指标。尽管孟加拉国的儿童死亡率随时间推移在下降,但为实现可持续发展目标(SDGs),仍需进一步降低。机器学习模型是做出更准确高效预测并获取深入知识的最佳工具之一。深入理解对于大幅降低儿童死亡率至关重要。使用机器学习模型进行准确预测可使当局能够及时采取干预措施并提高认识。因此,本研究旨在探索与儿童死亡率相关的因素,并评估各种机器学习模型在预测孟加拉国儿童死亡率方面的效果。

方法和材料

本研究从2017 - 18年进行的孟加拉国人口与健康调查(BDHS)数据中提取了约4.2万个观测值(不包括缺失观测值)。该调查采用两阶段分层抽样方法,选择了675个枚举区——250个城市地区和425个农村地区——从而从672个集群和20160户家庭有效收集了数据。卡方检验和递归特征消除(RFE)用于在众多因素中找出儿童死亡率的相关风险因素。实施了六种基于机器学习的算法来预测儿童死亡率,如朴素贝叶斯、分类与回归树、随机森林、C5.0分类、梯度提升机和逻辑回归。使用诸如准确率、特异性、敏感性、阴性预测值、得分、阳性预测值、k折交叉验证和曲线下面积(AUC)技术等模型评估指标来评估模型的性能。

结果与讨论

根据数据,儿童死亡率为8.2%。双变量分析表明,母亲未受过教育、贫困、体重过轻、年龄在35 - 49岁以及在20岁之前生育的儿童中,儿童死亡率较高。家庭的水源和宗教联系对儿童死亡率没有统计学上的显著影响。使用机器学习模型预测儿童死亡率是本研究的主要目标。没有一个机器学习模型能正确分类死亡情况。因此,本研究进行了过采样和欠采样分析。分别对过采样和欠采样技术抽取了约76727个和6910个观测值。根据过采样数据的结果,基于训练集和测试集的总体性能,随机森林在所有其他模型中表现最佳,准确率为70%。k折交叉验证方法证明了随机森林模型的卓越性能,并实现了最高的AUC(0.701)。另一方面,在欠采样分析中,梯度提升机在预测儿童死亡率方面的评估最高。k折交叉验证也说明了梯度提升机的更好性能。

结论

梯度提升机和随机森林在分类儿童死亡率方面具有最佳预测能力,可能有助于改善这方面的政策决策。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d08/11776272/862b9870bd05/12889_2025_21460_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验