Department of Statistics and Data Science, Jahangirnagar University, Dhaka, Bangladesh.
J Health Popul Nutr. 2024 Oct 27;43(1):170. doi: 10.1186/s41043-024-00646-9.
Although machine learning (ML) models are well-liked for their outperformance in prediction, greatly avoided due to the lack of intuition and explanation of their predictions. Interpretable ML is, therefore, an emerging research field that combines the performance and interpretability of ML models to create comprehensive solutions for complex decision-making analysis. Conversely, infant mortality is a global public health concern affecting health, social well-being, socio-economic development, and healthcare services. The study employs advanced interpretable ML techniques to anticipate and understand the factors affecting infant mortality in Bangladesh, overcoming the shortcomings of the conventional logistic regression (LR) model.
By utilizing the global surrogate model and local individual conditional expectation (ICE) interpretability technique, the interpretable support vector machine (SVM) has been used in this study to reveal significant characteristics of infant mortality using data from the Bangladesh Demographic and Health Survey (BDHS) 2017-18. To investigate intricate decision-making analysis of infant mortality, we adapted SVM and LR techniques with the hyperparameter tuning parameters. These models' performances were initially assessed using the receiver operating characteristics (ROC) curve, run-time, and confusion matrix parameters with 100 permutations. Afterward, the SVM model's model-agnostic explanation and the LR model's interpretation were compared to enhance advanced comprehension for further insights.
The results of the 100 permutations demonstrated that the LR model (Average: accuracy = 0.9105, precision = NaN, sensitivity = 0, specificity = 1, F1-score = 0, area under the ROC curve (AUC) = 0.6780, run-time = 0.0832) outperformed the SVM model (Average: accuracy = 0.8470, precision = 0.1062, sensitivity = 0.0949, specificity = 0.9209, F1-score = 0.1000, AUC = 0.5632, run-time = 0.0254) in predicting infant mortality, but the LR model had a slower run-time and it was unable to predict any positive cases. The interpretation of LR analysis revealed that infant mortality rates decrease when mothers give birth after over two years, with higher educational attainment, overweight or obese mothers, working mothers, and families with polluted cooking fuel having lower rates. The local ICE interpretability technique, which depicts individual influences on the average likelihood of dying within the first birthday, explores the interpretable SVM model that mothers with normal BMIs, giving birth within two years, using less polluted cooking fuel, working mothers, and having male infant were more likely to experience infant death. The interpretable SVM model based on the global surrogate model also reveals that working mothers who used polluted cooking fuel at home and working women who used less polluted cooking fuel but had a longer period between pregnancies than two years would have higher infant death rates. Even among non-working mothers who used polluted cooking fuel and gave birth within two years of the preceding one, infant death rates were higher.
The interpretable SVM model reveals global interpretations help clinicians understand the entire conditional distribution, while local interpretations focus on specific instances, providing different insights into model behavior. Interpretable ML models aid policymakers, stakeholders, and families in understanding and preventing infant deaths by improving policy-making strategies and establishing effective family counseling services.
虽然机器学习 (ML) 模型在预测方面表现出色,但由于缺乏对其预测的直观理解和解释,因此并未得到广泛应用。因此,可解释性机器学习是一个新兴的研究领域,它结合了机器学习模型的性能和可解释性,为复杂决策分析提供全面的解决方案。相反,婴儿死亡率是一个全球性的公共卫生问题,影响着健康、社会福利、社会经济发展和医疗保健服务。本研究采用先进的可解释性机器学习技术来预测和理解孟加拉国婴儿死亡率的影响因素,克服了传统逻辑回归 (LR) 模型的缺点。
本研究利用全局替代模型和局部个体条件期望 (ICE) 可解释性技术,使用来自孟加拉国人口与健康调查 (BDHS) 2017-18 年的数据,通过支持向量机 (SVM) 揭示婴儿死亡率的显著特征。为了研究婴儿死亡率的复杂决策分析,我们采用了 SVM 和 LR 技术,并调整了超参数。首先使用接收器工作特征 (ROC) 曲线、运行时间和混淆矩阵参数(100 次迭代)对这些模型的性能进行评估。然后,比较了 SVM 模型的模型不可知解释和 LR 模型的解释,以增强对模型的深入理解。
100 次迭代的结果表明,LR 模型(平均:准确率=0.9105,精确率=NaN,敏感度=0,特异性=1,F1 分数=0,ROC 曲线下面积 (AUC)=0.6780,运行时间=0.0832)优于 SVM 模型(平均:准确率=0.8470,精确率=0.1062,敏感度=0.0949,特异性=0.9209,F1 分数=0.1000,AUC=0.5632,运行时间=0.0254)在预测婴儿死亡率方面,但 LR 模型运行时间较慢,无法预测任何阳性病例。LR 分析的解释表明,当母亲在两年后分娩、教育程度较高、超重或肥胖、有工作的母亲和家庭使用污染性烹饪燃料时,婴儿死亡率会降低。局部 ICE 可解释性技术描绘了个体对第一个生日内死亡平均概率的影响,探索了可解释性 SVM 模型,该模型表明,BMI 正常、两年内分娩、使用污染性较小烹饪燃料、有工作的母亲和有男婴的母亲更有可能经历婴儿死亡。基于全局替代模型的可解释性 SVM 模型还表明,在家中使用污染性烹饪燃料且工作的母亲以及怀孕间隔超过两年但使用污染性较小烹饪燃料的工作女性,婴儿死亡率更高。即使在使用污染性烹饪燃料且在头胎后两年内分娩的非工作母亲中,婴儿死亡率也更高。
可解释性 SVM 模型揭示了全局解释有助于临床医生了解整个条件分布,而局部解释则侧重于特定实例,为模型行为提供了不同的见解。可解释性机器学习模型通过改进决策制定策略和建立有效的家庭咨询服务,帮助政策制定者、利益相关者和家庭了解和预防婴儿死亡。