Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, Technical University of Munich, Munich, Germany.
Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, University Hospital Ulm, University of Ulm, Albert-Einstein-Allee 23, Ulm, 89081, Germany.
BMC Med Inform Decis Mak. 2023 Apr 12;23(1):67. doi: 10.1186/s12911-023-02151-1.
Machine-learning models are susceptible to external influences which can result in performance deterioration. The aim of our study was to elucidate the impact of a sudden shift in covariates, like the one caused by the Covid-19 pandemic, on model performance.
After ethical approval and registration in Clinical Trials (NCT04092933, initial release 17/09/2019), we developed different models for the prediction of perioperative mortality based on preoperative data: one for the pre-pandemic data period until March 2020, one including data before the pandemic and from the first wave until May 2020, and one that covers the complete period before and during the pandemic until October 2021. We applied XGBoost as well as a Deep Learning neural network (DL). Performance metrics of each model during the different pandemic phases were determined, and XGBoost models were analysed for changes in feature importance.
XGBoost and DL provided similar performance on the pre-pandemic data with respect to area under receiver operating characteristic (AUROC, 0.951 vs. 0.942) and area under precision-recall curve (AUPR, 0.144 vs. 0.187). Validation in patient cohorts of the different pandemic waves showed high fluctuations in performance from both AUROC and AUPR for DL, whereas the XGBoost models seemed more stable. Change in variable frequencies with onset of the pandemic were visible in age, ASA score, and the higher proportion of emergency operations, among others. Age consistently showed the highest information gain. Models based on pre-pandemic data performed worse during the first pandemic wave (AUROC 0.914 for XGBoost and DL) whereas models augmented with data from the first wave lacked performance after the first wave (AUROC 0.907 for XGBoost and 0.747 for DL). The deterioration was also visible in AUPR, which worsened by over 50% in both XGBoost and DL in the first phase after re-training.
A sudden shift in data impacts model performance. Re-training the model with updated data may cause degradation in predictive accuracy if the changes are only transient. Too early re-training should therefore be avoided, and close model surveillance is necessary.
机器学习模型容易受到外部因素的影响,这可能导致性能下降。我们的研究目的是阐明像由新冠疫情引起的那样,协变量的突然变化对模型性能的影响。
在获得伦理批准并在临床试验中注册(NCT04092933,初始发布日期为 2019 年 9 月 17 日)后,我们基于术前数据开发了不同的预测围手术期死亡率模型:一个模型用于 2020 年 3 月之前的大流行前数据期,一个模型包括大流行前和第一波期间的数据,一个模型涵盖了 2021 年 10 月之前和期间的完整大流行期。我们应用了 XGBoost 和深度学习神经网络(DL)。确定了每个模型在不同大流行阶段的性能指标,并分析了 XGBoost 模型特征重要性的变化。
XGBoost 和 DL 在大流行前数据上的表现相似,在接收者操作特征曲线下面积(AUROC,0.951 对 0.942)和精确召回曲线下面积(AUPR,0.144 对 0.187)方面。对不同大流行波的患者队列进行验证时,DL 的 AUROC 和 AUPR 性能波动较大,而 XGBoost 模型似乎更稳定。随着大流行的开始,变量频率的变化可见于年龄、ASA 评分和急诊手术等比例的升高。年龄始终显示出最高的信息增益。基于大流行前数据的模型在第一波大流行期间表现较差(XGBoost 和 DL 的 AUROC 为 0.914),而在第一波期间增加数据的模型在第一波后缺乏性能(XGBoost 的 AUROC 为 0.907,DL 的 AUROC 为 0.747)。在重新训练后的第一阶段,AUPR 也明显恶化,XGBoost 和 DL 均恶化超过 50%。
数据的突然变化会影响模型性能。如果变化只是暂时的,用更新的数据重新训练模型可能会导致预测精度下降。因此,不应过早重新训练,并且需要密切监视模型。