• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

自动化机器学习死亡率预测算法对由 COVID 大流行引起的模型漂移的敏感性。

Susceptibility of AutoML mortality prediction algorithms to model drift caused by the COVID pandemic.

机构信息

Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, Technical University of Munich, Munich, Germany.

Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, University of Ulm, Albert-Einstein-Allee 23, Ulm, 89081, Germany.

出版信息

BMC Med Inform Decis Mak. 2024 Feb 2;24(1):34. doi: 10.1186/s12911-024-02428-z.

DOI:10.1186/s12911-024-02428-z
PMID:38308256
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10837894/
Abstract

BACKGROUND

Concept drift and covariate shift lead to a degradation of machine learning (ML) models. The objective of our study was to characterize sudden data drift as caused by the COVID pandemic. Furthermore, we investigated the suitability of certain methods in model training to prevent model degradation caused by data drift.

METHODS

We trained different ML models with the H2O AutoML method on a dataset comprising 102,666 cases of surgical patients collected in the years 2014-2019 to predict postoperative mortality using preoperatively available data. Models applied were Generalized Linear Model with regularization, Default Random Forest, Gradient Boosting Machine, eXtreme Gradient Boosting, Deep Learning and Stacked Ensembles comprising all base models. Further, we modified the original models by applying three different methods when training on the original pre-pandemic dataset: (Rahmani K, et al, Int J Med Inform 173:104930, 2023) we weighted older data weaker, (Morger A, et al, Sci Rep 12:7244, 2022) used only the most recent data for model training and (Dilmegani C, 2023) performed a z-transformation of the numerical input parameters. Afterwards, we tested model performance on a pre-pandemic and an in-pandemic data set not used in the training process, and analysed common features.

RESULTS

The models produced showed excellent areas under receiver-operating characteristic and acceptable precision-recall curves when tested on a dataset from January-March 2020, but significant degradation when tested on a dataset collected in the first wave of the COVID pandemic from April-May 2020. When comparing the probability distributions of the input parameters, significant differences between pre-pandemic and in-pandemic data were found. The endpoint of our models, in-hospital mortality after surgery, did not differ significantly between pre- and in-pandemic data and was about 1% in each case. However, the models varied considerably in the composition of their input parameters. None of our applied modifications prevented a loss of performance, although very different models emerged from it, using a large variety of parameters.

CONCLUSIONS

Our results show that none of our tested easy-to-implement measures in model training can prevent deterioration in the case of sudden external events. Therefore, we conclude that, in the presence of concept drift and covariate shift, close monitoring and critical review of model predictions are necessary.

摘要

背景

概念漂移和协变量偏移会导致机器学习(ML)模型的性能下降。我们的研究目的是描述由 COVID 大流行引起的突然数据漂移。此外,我们还研究了在模型训练中使用某些方法的适用性,以防止数据漂移导致的模型性能下降。

方法

我们使用 H2O AutoML 方法在一个包含 2014 年至 2019 年 102666 例外科患者数据的数据集上训练不同的 ML 模型,以使用术前可用数据预测术后死亡率。应用的模型包括正则化广义线性模型、默认随机森林、梯度提升机、极端梯度提升、深度学习和包含所有基础模型的堆叠集成。此外,当我们在原始的无大流行数据集上进行训练时,我们通过应用三种不同的方法来修改原始模型:(Rahmani K,等人,Int J Med Inform 173:104930,2023),我们对较旧的数据进行较弱的加权;(Morger A,等人,Sci Rep 12:7244,2022),仅使用最新的数据进行模型训练;(Dilmegani C,2023),对数值输入参数进行 z 转换。之后,我们在未用于训练过程的大流行前和大流行期间数据集上测试模型性能,并分析共同特征。

结果

当在 2020 年 1 月至 3 月的数据集上进行测试时,所产生的模型显示出出色的接收者操作特征曲线下面积和可接受的精度-召回率曲线,但在 2020 年 4 月至 5 月的 COVID 大流行第一波期间收集的数据集上进行测试时,性能显著下降。当比较输入参数的概率分布时,在大流行前和大流行期间的数据之间发现了显著差异。我们模型的终点,即手术后的住院死亡率,在大流行前和大流行期间的数据之间没有显著差异,每个数据的死亡率约为 1%。然而,模型在输入参数的组成上差异很大。我们应用的修改中没有一项能够防止性能下降,尽管由此产生了非常不同的模型,使用了大量不同的参数。

结论

我们的结果表明,在模型训练中应用的易于实施的措施都不能防止突然外部事件的恶化。因此,我们得出结论,在存在概念漂移和协变量偏移的情况下,需要密切监测和批判性地审查模型预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/53cb/10837894/5cb0546ddc2c/12911_2024_2428_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/53cb/10837894/8ba451efd07e/12911_2024_2428_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/53cb/10837894/48aca77e6e3a/12911_2024_2428_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/53cb/10837894/1c69733ecbac/12911_2024_2428_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/53cb/10837894/5cb0546ddc2c/12911_2024_2428_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/53cb/10837894/8ba451efd07e/12911_2024_2428_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/53cb/10837894/48aca77e6e3a/12911_2024_2428_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/53cb/10837894/1c69733ecbac/12911_2024_2428_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/53cb/10837894/5cb0546ddc2c/12911_2024_2428_Fig4_HTML.jpg

相似文献

1
Susceptibility of AutoML mortality prediction algorithms to model drift caused by the COVID pandemic.自动化机器学习死亡率预测算法对由 COVID 大流行引起的模型漂移的敏感性。
BMC Med Inform Decis Mak. 2024 Feb 2;24(1):34. doi: 10.1186/s12911-024-02428-z.
2
Impact of the Covid-19 pandemic on the performance of machine learning algorithms for predicting perioperative mortality.新冠疫情对预测围手术期死亡率的机器学习算法性能的影响。
BMC Med Inform Decis Mak. 2023 Apr 12;23(1):67. doi: 10.1186/s12911-023-02151-1.
3
Using Automated Machine Learning to Predict the Mortality of Patients With COVID-19: Prediction Model Development Study.利用自动化机器学习预测 COVID-19 患者的死亡率:预测模型开发研究。
J Med Internet Res. 2021 Feb 26;23(2):e23458. doi: 10.2196/23458.
4
Assessing the effects of data drift on the performance of machine learning models used in clinical sepsis prediction.评估数据漂移对临床脓毒症预测中使用的机器学习模型性能的影响。
Int J Med Inform. 2023 May;173:104930. doi: 10.1016/j.ijmedinf.2022.104930. Epub 2022 Nov 19.
5
Assessing the effects of data drift on the performance of machine learning models used in clinical sepsis prediction.评估数据漂移对临床脓毒症预测中使用的机器学习模型性能的影响。
medRxiv. 2022 Jun 7:2022.06.06.22276062. doi: 10.1101/2022.06.06.22276062.
6
Machine Learning to Predict Mortality and Critical Events in a Cohort of Patients With COVID-19 in New York City: Model Development and Validation.机器学习预测纽约市新冠肺炎患者队列中的死亡率和危急事件:模型开发与验证
J Med Internet Res. 2020 Nov 6;22(11):e24018. doi: 10.2196/24018.
7
Comparing machine learning algorithms to predict COVID‑19 mortality using a dataset including chest computed tomography severity score data.比较机器学习算法,使用包含胸部计算机断层扫描严重程度评分数据的数据集来预测 COVID-19 死亡率。
Sci Rep. 2023 Jul 13;13(1):11343. doi: 10.1038/s41598-023-38133-6.
8
The Development and Validation of Simplified Machine Learning Algorithms to Predict Prognosis of Hospitalized Patients With COVID-19: Multicenter, Retrospective Study.中文译文:简化机器学习算法预测 COVID-19 住院患者预后的开发和验证:多中心回顾性研究。
J Med Internet Res. 2022 Jan 21;24(1):e31549. doi: 10.2196/31549.
9
Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?预测模型工具能否识别 ACL 重建术后阿片类药物使用时间延长的高风险患者?
Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.
10
Recurrent neural network models (CovRNN) for predicting outcomes of patients with COVID-19 on admission to hospital: model development and validation using electronic health record data.用于预测COVID-19患者入院时预后的循环神经网络模型(CovRNN):使用电子健康记录数据进行模型开发和验证
Lancet Digit Health. 2022 Jun;4(6):e415-e425. doi: 10.1016/S2589-7500(22)00049-8. Epub 2022 Apr 21.

引用本文的文献

1
Mortality Prediction Performance Under Geographical, Temporal, and COVID-19 Pandemic Dataset Shift: External Validation of the Global Open-Source Severity of Illness Score Model.地理、时间和新冠疫情数据集偏移下的死亡率预测性能:全球开源疾病严重程度评分模型的外部验证
Crit Care Explor. 2025 Jun 4;7(6):e1275. doi: 10.1097/CCE.0000000000001275. eCollection 2025 Jun 1.
2
One-class support vector machines for detecting population drift in deployed machine learning medical diagnostics.用于检测已部署机器学习医学诊断中群体漂移的单类支持向量机。
Sci Rep. 2025 Apr 9;15(1):12157. doi: 10.1038/s41598-025-94427-x.
3
Correction: Susceptibility of AutoML mortality prediction algorithms to model drift caused by the COVID pandemic.

本文引用的文献

1
Enabling personalized perioperative risk prediction by using a machine-learning model based on preoperative data.基于术前数据的机器学习模型实现围手术期个体化风险预测。
Sci Rep. 2023 May 2;13(1):7128. doi: 10.1038/s41598-023-33981-8.
2
Impact of the Covid-19 pandemic on the performance of machine learning algorithms for predicting perioperative mortality.新冠疫情对预测围手术期死亡率的机器学习算法性能的影响。
BMC Med Inform Decis Mak. 2023 Apr 12;23(1):67. doi: 10.1186/s12911-023-02151-1.
3
Assessing the effects of data drift on the performance of machine learning models used in clinical sepsis prediction.
更正:自动化机器学习死亡率预测算法对由新冠疫情导致的模型漂移的敏感性。
BMC Med Inform Decis Mak. 2024 Feb 19;24(1):56. doi: 10.1186/s12911-024-02454-x.
评估数据漂移对临床脓毒症预测中使用的机器学习模型性能的影响。
Int J Med Inform. 2023 May;173:104930. doi: 10.1016/j.ijmedinf.2022.104930. Epub 2022 Nov 19.
4
Impact of COVID-19 on the patient referral pattern and conversion rate in the university versus private facial plastic surgery centers.COVID-19 对大学和私立整形美容外科中心患者转诊模式和转化率的影响。
Int Ophthalmol. 2023 Mar;43(3):707-715. doi: 10.1007/s10792-022-02469-1. Epub 2022 Aug 30.
5
Declines in the utilization of hospital-based care during COVID-19 pandemic.新冠疫情期间基于医院的医疗服务利用率下降。
J Hosp Med. 2022 Dec;17(12):984-989. doi: 10.1002/jhm.12955. Epub 2022 Aug 29.
6
AdaDiag: Adversarial Domain Adaptation of Diagnostic Prediction with Clinical Event Sequences.AdaDiag:基于临床事件序列的诊断预测的对抗性域自适应。
J Biomed Inform. 2022 Oct;134:104168. doi: 10.1016/j.jbi.2022.104168. Epub 2022 Aug 17.
7
Impact of COVID-19 on changing consumer behaviour: Lessons from an emerging economy.新冠疫情对消费者行为变化的影响:来自新兴经济体的经验教训。
Int J Consum Stud. 2022 May;46(3):692-715. doi: 10.1111/ijcs.12786. Epub 2022 Feb 14.
8
Studying and mitigating the effects of data drifts on ML model performance at the example of chemical toxicity data.以化学毒性数据为例,研究和减轻数据漂移对机器学习模型性能的影响。
Sci Rep. 2022 May 4;12(1):7244. doi: 10.1038/s41598-022-09309-3.
9
A novel lifelong machine learning-based method to eliminate calibration drift in clinical prediction models.一种基于机器学习的新方法,可消除临床预测模型中的校准漂移。
Artif Intell Med. 2022 Mar;125:102256. doi: 10.1016/j.artmed.2022.102256. Epub 2022 Feb 12.
10
Adaptation Strategies for Automated Machine Learning on Evolving Data.适应不断进化数据的自动化机器学习策略。
IEEE Trans Pattern Anal Mach Intell. 2021 Sep;43(9):3067-3078. doi: 10.1109/TPAMI.2021.3062900. Epub 2021 Aug 4.