文献检索，用中文搜 PubMed

BACKGROUND

Concept drift and covariate shift lead to a degradation of machine learning (ML) models. The objective of our study was to characterize sudden data drift as caused by the COVID pandemic. Furthermore, we investigated the suitability of certain methods in model training to prevent model degradation caused by data drift.

METHODS

We trained different ML models with the H2O AutoML method on a dataset comprising 102,666 cases of surgical patients collected in the years 2014-2019 to predict postoperative mortality using preoperatively available data. Models applied were Generalized Linear Model with regularization, Default Random Forest, Gradient Boosting Machine, eXtreme Gradient Boosting, Deep Learning and Stacked Ensembles comprising all base models. Further, we modified the original models by applying three different methods when training on the original pre-pandemic dataset: (Rahmani K, et al, Int J Med Inform 173:104930, 2023) we weighted older data weaker, (Morger A, et al, Sci Rep 12:7244, 2022) used only the most recent data for model training and (Dilmegani C, 2023) performed a z-transformation of the numerical input parameters. Afterwards, we tested model performance on a pre-pandemic and an in-pandemic data set not used in the training process, and analysed common features.

RESULTS

The models produced showed excellent areas under receiver-operating characteristic and acceptable precision-recall curves when tested on a dataset from January-March 2020, but significant degradation when tested on a dataset collected in the first wave of the COVID pandemic from April-May 2020. When comparing the probability distributions of the input parameters, significant differences between pre-pandemic and in-pandemic data were found. The endpoint of our models, in-hospital mortality after surgery, did not differ significantly between pre- and in-pandemic data and was about 1% in each case. However, the models varied considerably in the composition of their input parameters. None of our applied modifications prevented a loss of performance, although very different models emerged from it, using a large variety of parameters.

CONCLUSIONS

Our results show that none of our tested easy-to-implement measures in model training can prevent deterioration in the case of sudden external events. Therefore, we conclude that, in the presence of concept drift and covariate shift, close monitoring and critical review of model predictions are necessary.

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

概念漂移和协变量偏移会导致机器学习（ML）模型的性能下降。我们的研究目的是描述由 COVID 大流行引起的突然数据漂移。此外，我们还研究了在模型训练中使用某些方法的适用性，以防止数据漂移导致的模型性能下降。

方法

我们使用 H2O AutoML 方法在一个包含 2014 年至 2019 年 102666 例外科患者数据的数据集上训练不同的 ML 模型，以使用术前可用数据预测术后死亡率。应用的模型包括正则化广义线性模型、默认随机森林、梯度提升机、极端梯度提升、深度学习和包含所有基础模型的堆叠集成。此外，当我们在原始的无大流行数据集上进行训练时，我们通过应用三种不同的方法来修改原始模型：（Rahmani K，等人，Int J Med Inform 173:104930，2023），我们对较旧的数据进行较弱的加权；（Morger A，等人，Sci Rep 12:7244，2022），仅使用最新的数据进行模型训练；（Dilmegani C，2023），对数值输入参数进行 z 转换。之后，我们在未用于训练过程的大流行前和大流行期间数据集上测试模型性能，并分析共同特征。

结果

当在 2020 年 1 月至 3 月的数据集上进行测试时，所产生的模型显示出出色的接收者操作特征曲线下面积和可接受的精度-召回率曲线，但在 2020 年 4 月至 5 月的 COVID 大流行第一波期间收集的数据集上进行测试时，性能显著下降。当比较输入参数的概率分布时，在大流行前和大流行期间的数据之间发现了显著差异。我们模型的终点，即手术后的住院死亡率，在大流行前和大流行期间的数据之间没有显著差异，每个数据的死亡率约为 1%。然而，模型在输入参数的组成上差异很大。我们应用的修改中没有一项能够防止性能下降，尽管由此产生了非常不同的模型，使用了大量不同的参数。

结论

我们的结果表明，在模型训练中应用的易于实施的措施都不能防止突然外部事件的恶化。因此，我们得出结论，在存在概念漂移和协变量偏移的情况下，需要密切监测和批判性地审查模型预测。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

自动化机器学习死亡率预测算法对由 COVID 大流行引起的模型漂移的敏感性。

Susceptibility of AutoML mortality prediction algorithms to model drift caused by the COVID pandemic.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

相似文献

引用本文的文献

本文引用的文献

自动化机器学习死亡率预测算法对由 COVID 大流行引起的模型漂移的敏感性。

Susceptibility of AutoML mortality prediction algorithms to model drift caused by the COVID pandemic.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献