Zheng Yulu, Guo Zheng, Zhang Yanbo, Shang Jianjing, Yu Leilei, Fu Ping, Liu Yizhi, Li Xingang, Wang Hao, Ren Ling, Zhang Wei, Hou Haifeng, Tan Xuerui, Wang Wei
Centre for Precision Health, Edith Cowan University, 270 Joondalup Drive, Joondalup, 6027 Western Australia Australia.
The Second Affiliated Hospital of Shandong First Medical University, Tai'an, Shandong China.
EPMA J. 2022 May 27;13(2):285-298. doi: 10.1007/s13167-022-00283-4. eCollection 2022 Jun.
Recognising the early signs of ischemic stroke (IS) in emergency settings has been challenging. Machine learning (ML), a robust tool for predictive, preventive and personalised medicine (PPPM/3PM), presents a possible solution for this issue and produces accurate predictions for real-time data processing.
This investigation evaluated 4999 IS patients among a total of 10,476 adults included in the initial dataset, and 1076 IS subjects among 3935 participants in the external validation dataset. Six ML-based models for the prediction of IS were trained on the initial dataset of 10,476 participants (split participants into a training set [80%] and an internal validation set [20%]). Selected clinical laboratory features routinely assessed at admission were used to inform the models. Model performance was mainly evaluated by the area under the receiver operating characteristic (AUC) curve. Additional techniques-permutation feature importance (PFI), local interpretable model-agnostic explanations (LIME), and SHapley Additive exPlanations (SHAP)-were applied for explaining the black-box ML models.
Fifteen routine haematological and biochemical features were selected to establish ML-based models for the prediction of IS. The XGBoost-based model achieved the highest predictive performance, reaching AUCs of 0.91 (0.90-0.92) and 0.92 (0.91-0.93) in the internal and external datasets respectively. PFI globally revealed that demographic feature age, routine haematological parameters, haemoglobin and neutrophil count, and biochemical analytes total protein and high-density lipoprotein cholesterol were more influential on the model's prediction. LIME and SHAP showed similar local feature attribution explanations.
In the context of PPPM/3PM, we used the selected predictors obtained from the results of common blood tests to develop and validate ML-based models for the diagnosis of IS. The XGBoost-based model offers the most accurate prediction. By incorporating the individualised patient profile, this prediction tool is simple and quick to administer. This is promising to support subjective decision making in resource-limited settings or primary care, thereby shortening the time window for the treatment, and improving outcomes after IS.
The online version contains supplementary material available at 10.1007/s13167-022-00283-4.
在急诊环境中识别缺血性中风(IS)的早期迹象具有挑战性。机器学习(ML)作为预测、预防和个性化医疗(PPPM/3PM)的强大工具,为解决这一问题提供了可能的解决方案,并能对实时数据处理做出准确预测。
本研究在初始数据集中纳入的10476名成年人中评估了4999名IS患者,在外部验证数据集中的3935名参与者中评估了1076名IS受试者。基于六种ML的IS预测模型在10476名参与者的初始数据集上进行训练(将参与者分为训练集[80%]和内部验证集[20%])。使用入院时常规评估的选定临床实验室特征为模型提供信息。模型性能主要通过受试者操作特征(AUC)曲线下面积进行评估。还应用了其他技术——排列特征重要性(PFI)、局部可解释模型无关解释(LIME)和夏普利加法解释(SHAP)——来解释黑箱ML模型。
选择了15项常规血液学和生化特征来建立基于ML的IS预测模型。基于XGBoost的模型实现了最高的预测性能,在内部和外部数据集中的AUC分别达到0.91(0.90 - 0.92)和0.92(0.91 - 0.93)。PFI总体显示,人口统计学特征年龄、常规血液学参数、血红蛋白和中性粒细胞计数,以及生化分析物总蛋白和高密度脂蛋白胆固醇对模型预测的影响更大。LIME和SHAP显示出相似的局部特征归因解释。
在PPPM/3PM背景下,我们使用从普通血液检测结果中获得的选定预测指标来开发和验证基于ML的IS诊断模型。基于XGBoost的模型提供了最准确的预测。通过纳入个性化患者资料,这种预测工具操作简单快捷。这有望支持资源有限环境或初级保健中的主观决策,从而缩短治疗时间窗并改善IS后的结局。
在线版本包含可在10.1007/s13167 - 022 - 00283 - 4获取的补充材料。