Cao Yi, Deng Haipeng, Liu Shaoyun, Zeng Xi, Gou Yangyang, Zhang Weiting, Li Yixinyuan, Yang Hua, Peng Min
Department of Neurosurgery, Affiliated Hospital of Guizhou Medical University, Guiyang, China.
School of Nursing, Guizhou Medical University, Guiyang, China.
Front Neurol. 2025 Jun 18;16:1591570. doi: 10.3389/fneur.2025.1591570. eCollection 2025.
To develop and validate a machine learning (ML)-based model for predicting stroke-associated pneumonia (SAP) risk in older adult hemorrhagic stroke patients.
A retrospective collection of older adult hemorrhagic stroke patients from three tertiary hospitals in Guiyang, Guizhou Province (January 2019-December 2022) formed the modeling cohort, randomly split into training and internal validation sets (7:3 ratio). External validation utilized retrospective data from January-December 2023. After univariate and multivariate regression analyses, four ML models (Logistic Regression, XGBoost, Naive Bayes, and SVM) were constructed. Receiver operating characteristic (ROC) curves and area under the curve (AUC) were calculated for training and internal validation sets. Model performance was compared using Delong's test or Bootstrap test, while sensitivity, specificity, accuracy, precision, recall, and F1-score evaluated predictive efficacy. Calibration curves assessed model calibration. The optimal model underwent external validation using ROC and calibration curves.
A total of 788 older adult hemorrhagic stroke patients were enrolled, divided into a training set ( = 462), an internal validation set ( = 196), and an external validation set ( = 130). The incidence of SAP in older adult patients with hemorrhagic stroke was 46.7% (368/788). Advanced age [OR = 1.064, 95% CI (1.024, 1.104)], smoking[OR = 2.488, 95% CI (1.460, 4.24)], low GCS score [OR = 0.675, 95% CI (0.553, 0.825)], low Braden score [OR = 0.741, 95% CI (0.640, 0.858)], and nasogastric tube [OR = 1.761, 95% CI (1.048, 2.960)] were identified as risk factors for SAP. Among the four machine learning algorithms evaluated [XGBoost, Logistic Regression (LR), Support Vector Machine (SVM), and Naive Bayes], the LR model demonstrated robust and consistent performance in predicting SAP among older adult patients with hemorrhagic stroke across multiple evaluation metrics. Furthermore, the model exhibited stable generalizability within the external validation cohort. Based on these findings, the LR framework was subsequently selected for external validation, accompanied by a nomogram visualization. The model achieved AUC values of 0.883 (training), 0.855 (internal validation), and 0.882 (external validation). The Hosmer-Lemeshow (H-L) test indicates that the calibration of the model is satisfactory in all three datasets, with -values of 0.381, 0.142, and 0.066 respectively.
This study constructed and validated a risk prediction model for SAP in older adult patients with hemorrhagic stroke based on multi-center data. The results indicated that among the four machine learning algorithms (XGBoost, LR, SVM, and Naive Bayes), the LR model demonstrated the best and most stable predictive performance. Age, smoking, low GCS score, low Braden score, and nasogastric tube were identified as predictive factors for SAP in these patients. These indicators are easily obtainable in clinical practice and facilitate rapid bedside assessment. Through internal and external validation, the model was proven to have good generalization ability, and a nomogram was ultimately drawn to provide an objective and operational risk assessment tool for clinical nursing practice. It helps in the early identification of high-risk patients and guides targeted interventions, thereby reducing the incidence of SAP and improving patient prognosis.
开发并验证一种基于机器学习(ML)的模型,用于预测老年出血性中风患者发生中风相关性肺炎(SAP)的风险。
回顾性收集贵州省贵阳市三家三级医院2019年1月至2022年12月的老年出血性中风患者,形成建模队列,随机分为训练集和内部验证集(7:3比例)。外部验证使用2023年1月至12月的回顾性数据。经过单因素和多因素回归分析后,构建了四个ML模型(逻辑回归、XGBoost、朴素贝叶斯和支持向量机)。计算训练集和内部验证集的受试者操作特征(ROC)曲线和曲线下面积(AUC)。使用德龙检验或自助检验比较模型性能,同时用灵敏度、特异度、准确度、精确率、召回率和F1分数评估预测效能。校准曲线评估模型校准情况。使用ROC和校准曲线对最佳模型进行外部验证。
共纳入788例老年出血性中风患者,分为训练集(n = 462)、内部验证集(n = 196)和外部验证集(n = 130)。老年出血性中风患者中SAP的发生率为46.7%(368/788)。高龄[OR = 1.064,95%CI(1.024,1.104)]、吸烟[OR = 2.488,95%CI(1.460,4.24)]、低格拉斯哥昏迷量表(GCS)评分[OR = 0.675,95%CI(0.553,0.825)]、低布拉德评分[OR = 0.741,95%CI(0.640,0.858)]和鼻胃管[OR = 1.761,95%CI(1.048,2.960)]被确定为SAP的危险因素。在评估的四种机器学习算法(XGBoost、逻辑回归(LR)、支持向量机(SVM)和朴素贝叶斯)中,LR模型在多个评估指标上对老年出血性中风患者SAP的预测表现出稳健且一致的性能。此外,该模型在外部验证队列中表现出稳定的泛化能力。基于这些发现,随后选择LR框架进行外部验证,并伴有列线图可视化。该模型在训练集、内部验证集和外部验证集的AUC值分别为0.883、0.855和0.882。Hosmer-Lemeshow(H-L)检验表明,该模型在所有三个数据集中的校准均令人满意,P值分别为0.381、0.142和0.066。
本研究基于多中心数据构建并验证了老年出血性中风患者SAP的风险预测模型。结果表明,在四种机器学习算法(XGBoost、LR、SVM和朴素贝叶斯)中,LR模型表现出最佳且最稳定的预测性能。年龄、吸烟、低GCS评分、低布拉德评分和鼻胃管被确定为这些患者SAP的预测因素。这些指标在临床实践中易于获取,便于床边快速评估。通过内部和外部验证,该模型被证明具有良好的泛化能力,最终绘制了列线图,为临床护理实践提供了一种客观且可操作的风险评估工具。它有助于早期识别高危患者并指导针对性干预,从而降低SAP的发生率并改善患者预后。