Matsumoto Koutarou, Nohara Yasunobu, Sakaguchi Mikako, Takayama Yohei, Fukushige Syota, Soejima Hidehisa, Nakashima Naoki, Kamouchi Masahiro
Biostatistics Center, Kurume University, Kurume, Japan.
Big Data Science and Technology, Faculty of Advanced Science and Technology, Kumamoto University, Kumamoto, Japan.
JMIR Perioper Med. 2023 Oct 26;6:e50895. doi: 10.2196/50895.
Although machine learning models demonstrate significant potential in predicting postoperative delirium, the advantages of their implementation in real-world settings remain unclear and require a comparison with conventional models in practical applications.
The objective of this study was to validate the temporal generalizability of decision tree ensemble and sparse linear regression models for predicting delirium after surgery compared with that of the traditional logistic regression model.
The health record data of patients hospitalized at an advanced emergency and critical care medical center in Kumamoto, Japan, were collected electronically. We developed a decision tree ensemble model using extreme gradient boosting (XGBoost) and a sparse linear regression model using least absolute shrinkage and selection operator (LASSO) regression. To evaluate the predictive performance of the model, we used the area under the receiver operating characteristic curve (AUROC) and the Matthews correlation coefficient (MCC) to measure discrimination and the slope and intercept of the regression between predicted and observed probabilities to measure calibration. The Brier score was evaluated as an overall performance metric. We included 11,863 consecutive patients who underwent surgery with general anesthesia between December 2017 and February 2022. The patients were divided into a derivation cohort before the COVID-19 pandemic and a validation cohort during the COVID-19 pandemic. Postoperative delirium was diagnosed according to the confusion assessment method.
A total of 6497 patients (68.5, SD 14.4 years, women n=2627, 40.4%) were included in the derivation cohort, and 5366 patients (67.8, SD 14.6 years, women n=2105, 39.2%) were included in the validation cohort. Regarding discrimination, the XGBoost model (AUROC 0.87-0.90 and MCC 0.34-0.44) did not significantly outperform the LASSO model (AUROC 0.86-0.89 and MCC 0.34-0.41). The logistic regression model (AUROC 0.84-0.88, MCC 0.33-0.40, slope 1.01-1.19, intercept -0.16 to 0.06, and Brier score 0.06-0.07), with 8 predictors (age, intensive care unit, neurosurgery, emergency admission, anesthesia time, BMI, blood loss during surgery, and use of an ambulance) achieved good predictive performance.
The XGBoost model did not significantly outperform the LASSO model in predicting postoperative delirium. Furthermore, a parsimonious logistic model with a few important predictors achieved comparable performance to machine learning models in predicting postoperative delirium.
尽管机器学习模型在预测术后谵妄方面显示出巨大潜力,但其在实际应用中的优势仍不明确,需要在实际应用中与传统模型进行比较。
本研究的目的是验证决策树集成模型和稀疏线性回归模型预测术后谵妄的时间泛化性,并与传统逻辑回归模型进行比较。
通过电子方式收集日本熊本一家高级急救和重症医学中心住院患者的健康记录数据。我们使用极端梯度提升(XGBoost)开发了一个决策树集成模型,并使用最小绝对收缩和选择算子(LASSO)回归开发了一个稀疏线性回归模型。为了评估模型的预测性能,我们使用受试者工作特征曲线下面积(AUROC)和马修斯相关系数(MCC)来衡量辨别力,并使用预测概率与观察概率之间回归的斜率和截距来衡量校准度。将布里尔评分作为整体性能指标进行评估。我们纳入了2017年12月至2022年2月期间连续接受全身麻醉手术的11863例患者。这些患者被分为COVID-19大流行之前的推导队列和COVID-19大流行期间的验证队列。根据混乱评估方法诊断术后谵妄。
推导队列纳入了6497例患者(68.5岁,标准差14.4岁,女性n = 2627,40.4%),验证队列纳入了5366例患者(67.8岁,标准差14.6岁,女性n = 2105,39.2%)。在辨别力方面,XGBoost模型(AUROC 0.87 - 0.90,MCC 0.34 - 0.44)的表现并未显著优于LASSO模型(AUROC 0.86 - 0.89,MCC 0.34 - 0.41)。逻辑回归模型(AUROC 0.84 - 0.88,MCC 0.33 - 0.40,斜率1.01 - 1.19,截距 - 0.16至0.06,布里尔评分0.06 - 0.07),包含8个预测因子(年龄、重症监护病房、神经外科、急诊入院、麻醉时间、BMI、手术期间失血量和使用救护车),具有良好的预测性能。
在预测术后谵妄方面,XGBoost模型的表现并未显著优于LASSO模型。此外,一个具有几个重要预测因子的简约逻辑模型在预测术后谵妄方面与机器学习模型具有相当的性能。