Iqbal Farrukh, Satti Muhammad Islam, Irshad Azeem, Shah Mohd Asif
Department of Computer Science, Shaheed Zulfikar Ali Bhutto Institute of Science and Technology (SZABIST), Karachi, Pakistan.
Department of Computer Science, Millennium Institute of Technology & Entrepreneurship (MiTE), Karachi, Pakistan.
Open Life Sci. 2023 Jul 11;18(1):20220609. doi: 10.1515/biol-2022-0609. eCollection 2023.
In developing countries, child health and restraining under-five child mortality are one of the fundamental concerns. UNICEF adopted sustainable development goal 3 (SDG3) to reduce the under-five child mortality rate globally to 25 deaths per 1,000 live births. The under-five mortality rate is 69 deaths per 1,000 live child-births in Pakistan as reported by the Demographic and Health Survey (2018). Predictive analytics has the power to transform the healthcare industry, personalizing care for every individual. Pakistan Demographic Health Survey (2017-2018), the publicly available dataset, is used in this study and multiple imputation methods are adopted for the treatment of missing values. The information gain, a feature selection method, ranked the information-rich features and examine their impact on child mortality prediction. The synthetic minority over-sampling method (SMOTE) balanced the training dataset, and four supervised machine learning classifiers have been used, namely the decision tree classifier, random forest classifier, naive Bayes classifier, and extreme gradient boosting classifier. For comparative analysis, accuracy, precision, recall, and 1-score have been used. Eventually, a predictive analytics framework is built that predicts whether the child is alive or dead. The number under-five children in a household, preceding birth interval, family members, mother age, age of mother at first birth, antenatal care visits, breastfeeding, child size at birth, and place of delivery were found to be critical risk factors for child mortality. The random forest classifier performed efficiently and predicted under-five child mortality with accuracy (93.8%), precision (0.964), recall (0.971), and 1-score (0.967). The findings could greatly assist child health intervention programs in decision-making.
在发展中国家,儿童健康和降低五岁以下儿童死亡率是基本关切之一。联合国儿童基金会通过了可持续发展目标3(SDG3),以将全球五岁以下儿童死亡率降至每1000例活产25例死亡。据人口与健康调查(2018年)报告,巴基斯坦的五岁以下儿童死亡率为每1000例活产69例死亡。预测分析有能力改变医疗行业,为每个人提供个性化护理。本研究使用了公开可用的数据集巴基斯坦人口与健康调查(2017 - 2018年),并采用多种插补方法处理缺失值。信息增益作为一种特征选择方法,对信息丰富的特征进行排名,并研究它们对儿童死亡率预测的影响。合成少数过采样技术(SMOTE)平衡了训练数据集,并使用了四种监督式机器学习分类器,即决策树分类器、随机森林分类器、朴素贝叶斯分类器和极端梯度提升分类器。为了进行比较分析,使用了准确率、精确率、召回率和F1分数。最终构建了一个预测分析框架,用于预测儿童是存活还是死亡。研究发现,家庭中五岁以下儿童的数量、上次生育间隔、家庭成员、母亲年龄、初育时母亲年龄、产前检查次数、母乳喂养情况、出生时儿童大小以及分娩地点是儿童死亡率的关键风险因素。随机森林分类器表现高效,预测五岁以下儿童死亡率的准确率为93.8%,精确率为0.964,召回率为0.971,F1分数为0.967。这些研究结果可为儿童健康干预项目的决策提供极大帮助。