检测临床机器学习工具性能随时间的变化。

Detecting changes in the performance of a clinical machine learning tool over time.

机构信息

Center for Experimental and Molecular Medicine (CEMM), Amsterdam UMC, University of Amsterdam, Amsterdam, the Netherlands; Division of Acute Medicine, Department of Internal Medicine, Amsterdam UMC, VU University, Amsterdam, the Netherlands.

Division of Acute Medicine, Department of Internal Medicine, Amsterdam UMC, VU University, Amsterdam, the Netherlands; Department of Clinical Chemistry, Amsterdam UMC, VU University, Amsterdam, the Netherlands.

出版信息

EBioMedicine. 2023 Nov;97:104823. doi: 10.1016/j.ebiom.2023.104823. Epub 2023 Oct 2.

DOI:10.1016/j.ebiom.2023.104823

PMID:37793210

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10550508/

Abstract

BACKGROUND

Excessive use of blood cultures (BCs) in Emergency Departments (EDs) results in low yields and high contamination rates, associated with increased antibiotic use and unnecessary diagnostics. Our team previously developed and validated a machine learning model to predict BC outcomes and enhance diagnostic stewardship. While the model showed promising initial results, concerns over performance drift due to evolving patient demographics, clinical practices, and outcome rates warrant continual monitoring and evaluation of such models.

METHODS

A real-time evaluation of the model's performance was conducted between October 2021 and September 2022. The model was integrated into Amsterdam UMC's Electronic Health Record system, predicting BC outcomes for all adult patients with BC draws in real time. The model's performance was assessed monthly using metrics including the Area Under the Curve (AUC), Area Under the Precision-Recall Curve (AUPRC), and Brier scores. Statistical Process Control (SPC) charts were used to monitor variation over time.

FINDINGS

Across 3.035 unique adult patient visits, the model achieved an average AUC of 0.78, AUPRC of 0.41, and a Brier score of 0.10 for predicting the outcome of BCs drawn in the ED. While specific population characteristics changed over time, no statistical points outside the statistical control range were detected in the AUC, AUPRC, and Brier scores, indicating stable model performance. The average BC positivity rate during the study period was 13.4%.

INTERPRETATION

Despite significant changes in clinical practice, our BC stewardship tool exhibited stable performance, suggesting its robustness to changing environments. Using SPC charts for various metrics enables simple and effective monitoring of potential performance drift. The assessment of the variation of outcome rates and population changes may guide the specific interventions, such as intercept correction or recalibration, that may be needed to maintain a stable model performance over time. This study suggested no need to recalibrate or correct our BC stewardship tool.

FUNDING

No funding to disclose.

摘要

背景

急诊科（ED）过度使用血培养（BC）会导致低阳性率和高污染率，这与抗生素的过度使用和不必要的诊断有关。我们的团队之前开发并验证了一种机器学习模型，以预测 BC 的结果并加强诊断管理。虽然该模型显示出了有希望的初步结果，但由于患者人口统计学、临床实践和结果率的不断变化，对模型性能漂移的担忧需要对这些模型进行持续监测和评估。

方法

我们在 2021 年 10 月至 2022 年 9 月期间对该模型的性能进行了实时评估。该模型被集成到阿姆斯特丹 UMC 的电子健康记录系统中，实时预测所有接受 BC 采血的成年患者的 BC 结果。该模型的性能每月使用包括曲线下面积（AUC）、精度-召回率曲线下面积（AUPRC）和 Brier 评分在内的指标进行评估。统计过程控制（SPC）图表用于监测随时间的变化。

结果

在 3035 名独特的成年患者就诊中，该模型在预测 ED 中采集的 BC 结果方面的 AUC 平均为 0.78，AUPRC 为 0.41，Brier 评分 0.10。尽管特定的人口特征随时间发生了变化，但 AUC、AUPRC 和 Brier 评分中未检测到统计控制范围之外的统计点，表明模型性能稳定。研究期间的平均 BC 阳性率为 13.4%。

解释

尽管临床实践发生了重大变化，但我们的 BC 管理工具表现出稳定的性能，表明其对不断变化的环境具有稳健性。使用 SPC 图表对各种指标进行评估，可以实现对潜在性能漂移的简单有效的监测。对结果率和人口变化的变化进行评估可能会指导需要进行的具体干预措施，例如截距校正或重新校准，以保持模型性能随时间的稳定。本研究表明无需重新校准或修正我们的 BC 管理工具。

资助

无资金披露。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

检测临床机器学习工具性能随时间的变化。

Detecting changes in the performance of a clinical machine learning tool over time.

机构信息

出版信息

BACKGROUND

METHODS

FINDINGS

INTERPRETATION

FUNDING

背景

方法

结果

解释

资助

相似文献

引用本文的文献

检测临床机器学习工具性能随时间的变化。

Detecting changes in the performance of a clinical machine learning tool over time.

机构信息

出版信息

BACKGROUND

METHODS

FINDINGS

INTERPRETATION

FUNDING

背景

方法

结果

解释

资助

相似文献

引用本文的文献