The First Affiliated Hospital of Kunming Medical University, Kunming, China.
College of Big Data, Yunnan Agricultural University, Kunming, China.
BMC Med Inform Decis Mak. 2023 Nov 20;23(1):267. doi: 10.1186/s12911-023-02371-5.
The goal of this study was to assess the effectiveness of machine learning models and create an interpretable machine learning model that adequately explained 3-year all-cause mortality in patients with chronic heart failure.
The data in this paper were selected from patients with chronic heart failure who were hospitalized at the First Affiliated Hospital of Kunming Medical University, from 2017 to 2019 with cardiac function class III-IV. The dataset was explored using six different machine learning models, including logistic regression, naive Bayes, random forest classifier, extreme gradient boost, K-nearest neighbor, and decision tree. Finally, interpretable methods based on machine learning, such as SHAP value, permutation importance, and partial dependence plots, were used to estimate the 3-year all-cause mortality risk and produce individual interpretations of the model's conclusions.
In this paper, random forest was identified as the optimal aools lgorithm for this dataset. We also incorporated relevant machine learning interpretable tand techniques to improve disease prognosis, including permutation importance, PDP plots and SHAP values for analysis. From this study, we can see that the number of hospitalizations, age, glomerular filtration rate, BNP, NYHA cardiac function classification, lymphocyte absolute value, serum albumin, hemoglobin, total cholesterol, pulmonary artery systolic pressure and so on were important for providing an optimal risk assessment and were important predictive factors of chronic heart failure.
The machine learning-based cardiovascular risk models could be used to accurately assess and stratify the 3-year risk of all-cause mortality among CHF patients. Machine learning in combination with permutation importance, PDP plots, and the SHAP value could offer a clear explanation of individual risk prediction and give doctors an intuitive knowledge of the functions of important model components.
本研究旨在评估机器学习模型的有效性,并创建一个可解释的机器学习模型,以充分解释慢性心力衰竭患者 3 年全因死亡率。
本文的数据选自 2017 年至 2019 年在昆明医科大学第一附属医院住院的心功能 III-IV 级慢性心力衰竭患者。该数据集使用六种不同的机器学习模型进行了探索,包括逻辑回归、朴素贝叶斯、随机森林分类器、极端梯度提升、K-最近邻和决策树。最后,使用基于机器学习的可解释方法,如 SHAP 值、置换重要性和部分依赖图,来估计 3 年全因死亡率风险,并对模型结论进行个体解释。
在本文中,随机森林被确定为该数据集的最佳算法。我们还结合了相关的机器学习可解释技术来提高疾病预后,包括置换重要性、PDP 图和 SHAP 值分析。从这项研究中,我们可以看出,住院次数、年龄、肾小球滤过率、BNP、NYHA 心功能分级、淋巴细胞绝对值、血清白蛋白、血红蛋白、总胆固醇、肺动脉收缩压等对提供最佳风险评估很重要,是慢性心力衰竭的重要预测因素。
基于机器学习的心血管风险模型可用于准确评估和分层 CHF 患者 3 年全因死亡率风险。机器学习与置换重要性、PDP 图和 SHAP 值相结合,可以对个体风险预测提供清晰的解释,并为医生提供重要模型组件功能的直观知识。