Department of Electronic and Electrical Engineering, University of Sheffield, UK.
Department of Oncology and Metabolism, University of Sheffield, UK.
Comput Biol Med. 2022 May;144:105361. doi: 10.1016/j.compbiomed.2022.105361. Epub 2022 Mar 2.
This research develops machine learning models equipped with interpretation modules for mortality risk prediction and stratification in cohorts of hospitalised coronavirus disease-2019 (COVID-19) patients with and without diabetes mellitus (DM). To this end, routinely collected clinical data from 156 COVID-19 patients with DM and 349 COVID-19 patients without DM were scrutinised. First, a random forest classifier forecasted in-hospital COVID-19 fatality utilising admission data for each cohort. For the DM cohort, the model predicted mortality risk with the accuracy of 82%, area under the receiver operating characteristic curve (AUC) of 80%, sensitivity of 80%, and specificity of 56%. For the non-DM cohort, the achieved accuracy, AUC, sensitivity, and specificity were 80%, 84%, 91%, and 56%, respectively. The models were then interpreted using SHapley Additive exPlanations (SHAP), which explained predictors' global and local influences on model outputs. Finally, the k-means algorithm was applied to cluster patients on their SHAP values. The algorithm demarcated patients into three clusters. Average mortality rates within the generated clusters were 8%, 20%, and 76% for the DM cohort, 2.7%, 28%, and 41.9% for the non-DM cohort, providing a functional method of risk stratification.
本研究开发了配备解释模块的机器学习模型,用于预测和分层有糖尿病和无糖尿病的住院冠状病毒病 2019(COVID-19)患者队列的死亡率。为此,仔细研究了 156 名患有糖尿病的 COVID-19 患者和 349 名无糖尿病的 COVID-19 患者的常规临床数据。首先,随机森林分类器利用每个队列的入院数据预测住院 COVID-19 死亡率。对于 DM 队列,该模型预测死亡率的准确性为 82%,接收者操作特征曲线(AUC)为 80%,灵敏度为 80%,特异性为 56%。对于非 DM 队列,实现的准确性、AUC、灵敏度和特异性分别为 80%、84%、91%和 56%。然后使用 SHapley Additive exPlanations(SHAP)对模型进行解释,该解释说明了预测因子对模型输出的全局和局部影响。最后,应用 k-均值算法根据 SHAP 值对患者进行聚类。该算法将患者分为三个聚类。DM 队列中生成的聚类内的平均死亡率分别为 8%、20%和 76%,非 DM 队列中分别为 2.7%、28%和 41.9%,为风险分层提供了一种实用方法。