关于 SVM 模型预测孟加拉国婴儿死亡率的可解释性。

On the interpretability of the SVM model for predicting infant mortality in Bangladesh.

机构信息

Department of Statistics and Data Science, Jahangirnagar University, Dhaka, Bangladesh.

出版信息

J Health Popul Nutr. 2024 Oct 27;43(1):170. doi: 10.1186/s41043-024-00646-9.

DOI:10.1186/s41043-024-00646-9

PMID:39462431

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11520049/

Abstract

BACKGROUND

Although machine learning (ML) models are well-liked for their outperformance in prediction, greatly avoided due to the lack of intuition and explanation of their predictions. Interpretable ML is, therefore, an emerging research field that combines the performance and interpretability of ML models to create comprehensive solutions for complex decision-making analysis. Conversely, infant mortality is a global public health concern affecting health, social well-being, socio-economic development, and healthcare services. The study employs advanced interpretable ML techniques to anticipate and understand the factors affecting infant mortality in Bangladesh, overcoming the shortcomings of the conventional logistic regression (LR) model.

METHODS

By utilizing the global surrogate model and local individual conditional expectation (ICE) interpretability technique, the interpretable support vector machine (SVM) has been used in this study to reveal significant characteristics of infant mortality using data from the Bangladesh Demographic and Health Survey (BDHS) 2017-18. To investigate intricate decision-making analysis of infant mortality, we adapted SVM and LR techniques with the hyperparameter tuning parameters. These models' performances were initially assessed using the receiver operating characteristics (ROC) curve, run-time, and confusion matrix parameters with 100 permutations. Afterward, the SVM model's model-agnostic explanation and the LR model's interpretation were compared to enhance advanced comprehension for further insights.

RESULTS

The results of the 100 permutations demonstrated that the LR model (Average: accuracy = 0.9105, precision = NaN, sensitivity = 0, specificity = 1, F1-score = 0, area under the ROC curve (AUC) = 0.6780, run-time = 0.0832) outperformed the SVM model (Average: accuracy = 0.8470, precision = 0.1062, sensitivity = 0.0949, specificity = 0.9209, F1-score = 0.1000, AUC = 0.5632, run-time = 0.0254) in predicting infant mortality, but the LR model had a slower run-time and it was unable to predict any positive cases. The interpretation of LR analysis revealed that infant mortality rates decrease when mothers give birth after over two years, with higher educational attainment, overweight or obese mothers, working mothers, and families with polluted cooking fuel having lower rates. The local ICE interpretability technique, which depicts individual influences on the average likelihood of dying within the first birthday, explores the interpretable SVM model that mothers with normal BMIs, giving birth within two years, using less polluted cooking fuel, working mothers, and having male infant were more likely to experience infant death. The interpretable SVM model based on the global surrogate model also reveals that working mothers who used polluted cooking fuel at home and working women who used less polluted cooking fuel but had a longer period between pregnancies than two years would have higher infant death rates. Even among non-working mothers who used polluted cooking fuel and gave birth within two years of the preceding one, infant death rates were higher.

CONCLUSIONS

The interpretable SVM model reveals global interpretations help clinicians understand the entire conditional distribution, while local interpretations focus on specific instances, providing different insights into model behavior. Interpretable ML models aid policymakers, stakeholders, and families in understanding and preventing infant deaths by improving policy-making strategies and establishing effective family counseling services.

摘要

背景

虽然机器学习 (ML) 模型在预测方面表现出色，但由于缺乏对其预测的直观理解和解释，因此并未得到广泛应用。因此，可解释性机器学习是一个新兴的研究领域，它结合了机器学习模型的性能和可解释性，为复杂决策分析提供全面的解决方案。相反，婴儿死亡率是一个全球性的公共卫生问题，影响着健康、社会福利、社会经济发展和医疗保健服务。本研究采用先进的可解释性机器学习技术来预测和理解孟加拉国婴儿死亡率的影响因素，克服了传统逻辑回归 (LR) 模型的缺点。

方法

本研究利用全局替代模型和局部个体条件期望 (ICE) 可解释性技术，使用来自孟加拉国人口与健康调查 (BDHS) 2017-18 年的数据，通过支持向量机 (SVM) 揭示婴儿死亡率的显著特征。为了研究婴儿死亡率的复杂决策分析，我们采用了 SVM 和 LR 技术，并调整了超参数。首先使用接收器工作特征 (ROC) 曲线、运行时间和混淆矩阵参数（100 次迭代）对这些模型的性能进行评估。然后，比较了 SVM 模型的模型不可知解释和 LR 模型的解释，以增强对模型的深入理解。

结果

100 次迭代的结果表明，LR 模型（平均：准确率=0.9105，精确率=NaN，敏感度=0，特异性=1，F1 分数=0，ROC 曲线下面积 (AUC)=0.6780，运行时间=0.0832）优于 SVM 模型（平均：准确率=0.8470，精确率=0.1062，敏感度=0.0949，特异性=0.9209，F1 分数=0.1000，AUC=0.5632，运行时间=0.0254）在预测婴儿死亡率方面，但 LR 模型运行时间较慢，无法预测任何阳性病例。LR 分析的解释表明，当母亲在两年后分娩、教育程度较高、超重或肥胖、有工作的母亲和家庭使用污染性烹饪燃料时，婴儿死亡率会降低。局部 ICE 可解释性技术描绘了个体对第一个生日内死亡平均概率的影响，探索了可解释性 SVM 模型，该模型表明，BMI 正常、两年内分娩、使用污染性较小烹饪燃料、有工作的母亲和有男婴的母亲更有可能经历婴儿死亡。基于全局替代模型的可解释性 SVM 模型还表明，在家中使用污染性烹饪燃料且工作的母亲以及怀孕间隔超过两年但使用污染性较小烹饪燃料的工作女性，婴儿死亡率更高。即使在使用污染性烹饪燃料且在头胎后两年内分娩的非工作母亲中，婴儿死亡率也更高。

结论

可解释性 SVM 模型揭示了全局解释有助于临床医生了解整个条件分布，而局部解释则侧重于特定实例，为模型行为提供了不同的见解。可解释性机器学习模型通过改进决策制定策略和建立有效的家庭咨询服务，帮助政策制定者、利益相关者和家庭了解和预防婴儿死亡。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f612/11520049/2bed0da65bfc/41043_2024_646_Fig1_HTML.jpg

相似文献

On the interpretability of the SVM model for predicting infant mortality in Bangladesh.关于 SVM 模型预测孟加拉国婴儿死亡率的可解释性。

J Health Popul Nutr. 2024 Oct 27;43(1):170. doi: 10.1186/s41043-024-00646-9.

Prediction and feature selection of low birth weight using machine learning algorithms.利用机器学习算法预测和选择低出生体重。

J Health Popul Nutr. 2024 Oct 12;43(1):157. doi: 10.1186/s41043-024-00647-8.

On the interpretability of machine learning-based model for predicting hypertension.基于机器学习的高血压预测模型的可解释性研究。

BMC Med Inform Decis Mak. 2019 Jul 29;19(1):146. doi: 10.1186/s12911-019-0874-0.

Interpretable machine learning for allergic rhinitis prediction among preschool children in Urumqi, China.中国乌鲁木齐学龄前儿童变应性鼻炎预测的可解释机器学习。

Sci Rep. 2024 Sep 27;14(1):22281. doi: 10.1038/s41598-024-73733-w.

An assessment of random forest technique using simulation study: illustration with infant mortality in Bangladesh.使用模拟研究对随机森林技术进行评估：以孟加拉国的婴儿死亡率为例

Health Inf Sci Syst. 2022 Jun 21;10(1):12. doi: 10.1007/s13755-022-00180-0. eCollection 2022 Dec.

Machine learning models predict triage levels, massive transfusion protocol activation, and mortality in trauma utilizing patients hemodynamics on admission.机器学习模型利用创伤患者入院时的血流动力学来预测分诊级别、大量输血方案的激活和死亡率。

Comput Biol Med. 2024 Sep;179:108880. doi: 10.1016/j.compbiomed.2024.108880. Epub 2024 Jul 16.

Comparative effectiveness of explainable machine learning approaches for extrauterine growth restriction classification in preterm infants using longitudinal data.使用纵向数据的可解释机器学习方法对早产儿宫外生长受限分类的比较有效性

Front Med (Lausanne). 2023 Nov 29;10:1166743. doi: 10.3389/fmed.2023.1166743. eCollection 2023.

Responsible AI for cardiovascular disease detection: Towards a privacy-preserving and interpretable model.心血管疾病检测的负责任 AI：迈向隐私保护和可解释的模型。

Comput Methods Programs Biomed. 2024 Sep;254:108289. doi: 10.1016/j.cmpb.2024.108289. Epub 2024 Jun 17.

Prevalence and predicting factors of perceived stress among Bangladeshi university students using machine learning algorithms.利用机器学习算法评估孟加拉国大学生感知压力的现状及预测因素。

J Health Popul Nutr. 2021 Nov 27;40(1):50. doi: 10.1186/s41043-021-00276-5.

Prediction of sepsis mortality in ICU patients using machine learning methods.使用机器学习方法预测 ICU 患者的败血症死亡率。

BMC Med Inform Decis Mak. 2024 Aug 16;24(1):228. doi: 10.1186/s12911-024-02630-z.

引用本文的文献

Advances in Surrogate Modeling for Biological Agent-Based Simulations: Trends, Challenges, and Future Prospects.基于生物主体的模拟中代理模型的进展：趋势、挑战与未来前景

ArXiv. 2025 Apr 15:arXiv:2504.11617v1.

本文引用的文献

Health Inf Sci Syst. 2022 Jun 21;10(1):12. doi: 10.1007/s13755-022-00180-0. eCollection 2022 Dec.

The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression.类别不平衡校正对风险预测模型的危害：使用逻辑回归进行说明和模拟。

J Am Med Inform Assoc. 2022 Aug 16;29(9):1525-1534. doi: 10.1093/jamia/ocac093.

Levels, trends and socio-demographic determinants of infant and under-five mortalities in and around slum areas of Dhaka city, Bangladesh.孟加拉国达卡市贫民窟及周边地区婴儿和五岁以下儿童死亡率的水平、趋势及社会人口学决定因素。

SSM Popul Health. 2022 Jan 28;17:101033. doi: 10.1016/j.ssmph.2022.101033. eCollection 2022 Mar.

Child mortality in Bangladesh - why, when, where and how? A national survey-based analysis.孟加拉国的儿童死亡率——为何、何时、何地以及如何？一项基于全国性调查的分析。

J Glob Health. 2021 Sep 11;11:04052. doi: 10.7189/jogh.11.04052. eCollection 2021.

Interpreting SVM for medical images using Quadtree.使用四叉树对医学图像进行支持向量机解释。

Multimed Tools Appl. 2020;79(39-40):29353-29373. doi: 10.1007/s11042-020-09431-2. Epub 2020 Aug 11.

On the interpretability of machine learning-based model for predicting hypertension.基于机器学习的高血压预测模型的可解释性研究。

BMC Med Inform Decis Mak. 2019 Jul 29;19(1):146. doi: 10.1186/s12911-019-0874-0.

Predicting coronary artery disease: a comparison between two data mining algorithms.预测冠状动脉疾病：两种数据挖掘算法的比较。

BMC Public Health. 2019 Apr 29;19(1):448. doi: 10.1186/s12889-019-6721-5.

Independent and combined effects of maternal smoking and solid fuel on infant and child mortality in sub-Saharan Africa.撒哈拉以南非洲地区母亲吸烟与使用固体燃料对婴幼儿死亡率的独立及综合影响。

Trop Med Int Health. 2016 Dec;21(12):1572-1582. doi: 10.1111/tmi.12779. Epub 2016 Sep 29.

The Bangladesh paradox: exceptional health achievement despite economic poverty.孟加拉国悖论：经济贫困，健康成就卓越。

Lancet. 2013 Nov 23;382(9906):1734-45. doi: 10.1016/S0140-6736(13)62148-0. Epub 2013 Nov 21.

Preterm birth and neonatal mortality in a rural Bangladeshi cohort: implications for health programs.孟加拉国农村队列中的早产和新生儿死亡：对卫生项目的影响。

J Perinatol. 2013 Dec;33(12):977-81. doi: 10.1038/jp.2013.91. Epub 2013 Aug 15.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

关于 SVM 模型预测孟加拉国婴儿死亡率的可解释性。

On the interpretability of the SVM model for predicting infant mortality in Bangladesh.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献