比较可解释机器学习方法与传统统计方法用于评估中风风险模型：回顾性队列研究

Comparing Explainable Machine Learning Approaches With Traditional Statistical Methods for Evaluating Stroke Risk Models: Retrospective Cohort Study.

作者信息

Lolak Sermkiat, Attia John, McKay Gareth J, Thakkinstian Ammarin

机构信息

Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand.

Centre for Clinical Epidemiology and Biostatistics, School of Medicine and Public Health, Hunter Medical Research Institute, University of Newcastle, New South Wales, Australia.

出版信息

JMIR Cardio. 2023 Jul 26;7:e47736. doi: 10.2196/47736.

DOI:10.2196/47736

PMID:37494080

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10413234/

Abstract

BACKGROUND

Stroke has multiple modifiable and nonmodifiable risk factors and represents a leading cause of death globally. Understanding the complex interplay of stroke risk factors is thus not only a scientific necessity but a critical step toward improving global health outcomes.

OBJECTIVE

We aim to assess the performance of explainable machine learning models in predicting stroke risk factors using real-world cohort data by comparing explainable machine learning models with conventional statistical methods.

METHODS

This retrospective cohort included high-risk patients from Ramathibodi Hospital in Thailand between January 2010 and December 2020. We compared the performance and explainability of logistic regression (LR), Cox proportional hazard, Bayesian network (BN), tree-augmented Naïve Bayes (TAN), extreme gradient boosting (XGBoost), and explainable boosting machine (EBM) models. We used multiple imputation by chained equations for missing data and discretized continuous variables as needed. Models were evaluated using C-statistics and F-scores.

RESULTS

Out of 275,247 high-risk patients, 9659 (3.5%) experienced a stroke. XGBoost demonstrated the highest performance with a C-statistic of 0.89 and an F-score of 0.80 followed by EBM and TAN with C-statistics of 0.87 and 0.83, respectively; LR and BN had similar C-statistics of 0.80. Significant factors associated with stroke included atrial fibrillation (AF), hypertension (HT), antiplatelets, HDL, and age. AF, HT, and antihypertensive medication were common significant factors across most models, with AF being the strongest factor in LR, XGBoost, BN, and TAN models.

CONCLUSIONS

Our study developed stroke prediction models to identify crucial predictive factors such as AF, HT, or systolic blood pressure or antihypertensive medication, anticoagulant medication, HDL, age, and statin use in high-risk patients. The explainable XGBoost was the best model in predicting stroke risk, followed by EBM.

摘要

背景

中风有多种可改变和不可改变的风险因素，是全球主要的死亡原因之一。因此，了解中风风险因素之间复杂的相互作用不仅是科学上的必要，也是改善全球健康结果的关键一步。

目的

我们旨在通过将可解释机器学习模型与传统统计方法进行比较，评估可解释机器学习模型在使用真实队列数据预测中风风险因素方面的性能。

方法

这项回顾性队列研究纳入了2010年1月至2020年12月期间泰国拉玛蒂博迪医院的高危患者。我们比较了逻辑回归（LR）、Cox比例风险模型、贝叶斯网络（BN）、树增强朴素贝叶斯（TAN）、极端梯度提升（XGBoost）和可解释增强机器（EBM）模型的性能和可解释性。我们使用链式方程多重填补法处理缺失数据，并根据需要对连续变量进行离散化。模型使用C统计量和F分数进行评估。

结果

在275247名高危患者中，9659名（3.5%）发生了中风。XGBoost表现最佳，C统计量为0.89，F分数为0.80，其次是EBM和TAN，C统计量分别为0.87和0.83；LR和BN的C统计量相似，均为0.80。与中风相关的显著因素包括心房颤动（AF）、高血压（HT）、抗血小板药物、高密度脂蛋白（HDL）和年龄。AF、HT和抗高血压药物是大多数模型中常见的显著因素，AF是LR、XGBoost、BN和TAN模型中最强的因素。

结论

我们的研究开发了中风预测模型，以识别高危患者中诸如AF、HT或收缩压、抗高血压药物、抗凝药物、HDL、年龄和他汀类药物使用等关键预测因素。可解释的XGBoost是预测中风风险的最佳模型，其次是EBM。

相似文献

Comparing Explainable Machine Learning Approaches With Traditional Statistical Methods for Evaluating Stroke Risk Models: Retrospective Cohort Study.

JMIR Cardio. 2023 Jul 26;7:e47736. doi: 10.2196/47736.

Comparative Effectiveness of Machine Learning Approaches for Predicting Gastrointestinal Bleeds in Patients Receiving Antithrombotic Treatment.

JAMA Netw Open. 2021 May 3;4(5):e2110703. doi: 10.1001/jamanetworkopen.2021.10703.

Prediction of incident atrial fibrillation in post-stroke patients using machine learning: a French nationwide study.

Clin Res Cardiol. 2023 Jun;112(6):815-823. doi: 10.1007/s00392-022-02140-w. Epub 2022 Dec 17.

XGBoost, A Novel Explainable AI Technique, in the Prediction of Myocardial Infarction: A UK Biobank Cohort Study.

Clin Med Insights Cardiol. 2022 Nov 8;16:11795468221133611. doi: 10.1177/11795468221133611. eCollection 2022.

An accurate and explainable ensemble learning method for carotid plaque prediction in an asymptomatic population.

Comput Methods Programs Biomed. 2022 Jun;221:106842. doi: 10.1016/j.cmpb.2022.106842. Epub 2022 Apr 28.

Explainable Machine Learning Techniques To Predict Amiodarone-Induced Thyroid Dysfunction Risk: Multicenter, Retrospective Study With External Validation.

J Med Internet Res. 2023 Feb 7;25:e43734. doi: 10.2196/43734.

A proposed tree-based explainable artificial intelligence approach for the prediction of angina pectoris.

Sci Rep. 2023 Dec 14;13(1):22189. doi: 10.1038/s41598-023-49673-2.

Machine Learning Approaches for Predicting Hypertension and Its Associated Factors Using Population-Level Data From Three South Asian Countries.

Front Cardiovasc Med. 2022 Mar 31;9:839379. doi: 10.3389/fcvm.2022.839379. eCollection 2022.

Predicting post-stroke pneumonia using deep neural network approaches.

Int J Med Inform. 2019 Dec;132:103986. doi: 10.1016/j.ijmedinf.2019.103986. Epub 2019 Oct 1.

Explainable SHAP-XGBoost models for in-hospital mortality after myocardial infarction.

Cardiovasc Digit Health J. 2023 Jun 14;4(4):126-132. doi: 10.1016/j.cvdhj.2023.06.001. eCollection 2023 Aug.

引用本文的文献

Explainable artificial intelligence for stroke risk stratification in atrial fibrillation.

Eur Heart J Digit Health. 2025 Mar 22;6(3):317-325. doi: 10.1093/ehjdh/ztaf019. eCollection 2025 May.

Clinical applications of artificial intelligence and machine learning in neurocardiology: a comprehensive review.

Front Cardiovasc Med. 2025 Apr 3;12:1525966. doi: 10.3389/fcvm.2025.1525966. eCollection 2025.

Interpretable predictive value of including HDL-2b and HDL-3 in an explainable boosting machine model for multiclass classification of coronary artery stenosis severity in acute myocardial infarction patients.

Eur Heart J Digit Health. 2024 Dec 23;6(2):228-239. doi: 10.1093/ehjdh/ztae100. eCollection 2025 Mar.

Urban and rural disparities in stroke prediction using machine learning among Chinese older adults.

Sci Rep. 2025 Feb 25;15(1):6779. doi: 10.1038/s41598-025-91157-y.

Uncertainty estimation in diagnosis generation from large language models: next-word probability is not pre-test probability.

JAMIA Open. 2025 Jan 10;8(1):ooae154. doi: 10.1093/jamiaopen/ooae154. eCollection 2025 Feb.

Predicting upper limb motor recovery in subacute stroke patients via fNIRS-measured cerebral functional responses induced by robotic training.

J Neuroeng Rehabil. 2024 Dec 23;21(1):226. doi: 10.1186/s12984-024-01523-6.

Management of Patients Receiving Anticoagulation Therapy in Dental Practice: A Systematic Review.

Healthcare (Basel). 2024 Aug 2;12(15):1537. doi: 10.3390/healthcare12151537.

The Roles of NOTCH3 p.R544C and Thrombophilia Genes in Vietnamese Patients With Ischemic Stroke: Study Involving a Hierarchical Cluster Analysis.

JMIR Bioinform Biotechnol. 2024 May 7;5:e56884. doi: 10.2196/56884.

Predicting 90-Day Prognosis in Ischemic Stroke Patients Post Thrombolysis Using Machine Learning.

J Pers Med. 2023 Oct 30;13(11):1555. doi: 10.3390/jpm13111555.

本文引用的文献

An Explainable Machine Learning Pipeline for Stroke Prediction on Imbalanced Data.

Diagnostics (Basel). 2022 Oct 1;12(10):2392. doi: 10.3390/diagnostics12102392.

Unbox the black-box for the medical explainable AI via multi-modal and multi-centre data fusion: A mini-review, two showcases and beyond.

Inf Fusion. 2022 Jan;77:29-52. doi: 10.1016/j.inffus.2021.07.016.

2021 Guideline for the Prevention of Stroke in Patients With Stroke and Transient Ischemic Attack: A Guideline From the American Heart Association/American Stroke Association.

Stroke. 2021 Jul;52(7):e364-e467. doi: 10.1161/STR.0000000000000375. Epub 2021 May 24.

A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI.

IEEE Trans Neural Netw Learn Syst. 2021 Nov;32(11):4793-4813. doi: 10.1109/TNNLS.2020.3027314. Epub 2021 Oct 27.

Machine Learning for Brain Stroke: A Review.

J Stroke Cerebrovasc Dis. 2020 Oct;29(10):105162. doi: 10.1016/j.jstrokecerebrovasdis.2020.105162. Epub 2020 Jul 28.

Risk stratification in pulmonary arterial hypertension using Bayesian analysis.

Eur Respir J. 2020 Aug 27;56(2). doi: 10.1183/13993003.00008-2020. Print 2020 Aug.

Predicting Survival in Patients With Pulmonary Arterial Hypertension: The REVEAL Risk Score Calculator 2.0 and Comparison With ESC/ERS-Based Risk Assessment Strategies.

Chest. 2019 Aug;156(2):323-337. doi: 10.1016/j.chest.2019.02.004. Epub 2019 Feb 14.

Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980-2017: a systematic analysis for the Global Burden of Disease Study 2017.

Lancet. 2018 Nov 10;392(10159):1736-1788. doi: 10.1016/S0140-6736(18)32203-7. Epub 2018 Nov 8.

A Bayesian Network Model for Predicting Post-stroke Outcomes With Available Risk Factors.

Front Neurol. 2018 Sep 7;9:699. doi: 10.3389/fneur.2018.00699. eCollection 2018.

2017 ACC/AHA/AAPA/ABC/ACPM/AGS/APhA/ASH/ASPC/NMA/PCNA Guideline for the Prevention, Detection, Evaluation, and Management of High Blood Pressure in Adults: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines.

J Am Coll Cardiol. 2018 May 15;71(19):e127-e248. doi: 10.1016/j.jacc.2017.11.006. Epub 2017 Nov 13.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

比较可解释机器学习方法与传统统计方法用于评估中风风险模型：回顾性队列研究

Comparing Explainable Machine Learning Approaches With Traditional Statistical Methods for Evaluating Stroke Risk Models: Retrospective Cohort Study.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献