Charité Lab for Artificial Intelligence in Medicine (CLAIM), CharitéUniversitätsmedizin Berlin, Berlin, Germany .
QUEST Center for Responsible Research, Berlin Institute of Health at Charité Universitätsmedizin Berlin, Berlin, Germany .
Crit Care Med. 2024 Nov 1;52(11):1710-1721. doi: 10.1097/CCM.0000000000006359. Epub 2024 Jul 3.
OBJECTIVES: To evaluate the transferability of deep learning (DL) models for the early detection of adverse events to previously unseen hospitals. DESIGN: Retrospective observational cohort study utilizing harmonized intensive care data from four public datasets. SETTING: ICUs across Europe and the United States. PATIENTS: Adult patients admitted to the ICU for at least 6 hours who had good data quality. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: Using carefully harmonized data from a total of 334,812 ICU stays, we systematically assessed the transferability of DL models for three common adverse events: death, acute kidney injury (AKI), and sepsis. We tested whether using more than one data source and/or algorithmically optimizing for generalizability during training improves model performance at new hospitals. We found that models achieved high area under the receiver operating characteristic (AUROC) for mortality (0.838-0.869), AKI (0.823-0.866), and sepsis (0.749-0.824) at the training hospital. As expected, AUROC dropped when models were applied at other hospitals, sometimes by as much as -0.200. Using more than one dataset for training mitigated the performance drop, with multicenter models performing roughly on par with the best single-center model. Dedicated methods promoting generalizability did not noticeably improve performance in our experiments. CONCLUSIONS: Our results emphasize the importance of diverse training data for DL-based risk prediction. They suggest that as data from more hospitals become available for training, models may become increasingly generalizable. Even so, good performance at a new hospital still depended on the inclusion of compatible hospitals during training.
目的:评估深度学习 (DL) 模型在早期检测不良事件方面的可转移性,以应用于以前未见的医院。
设计:利用来自四个公共数据集的协调一致的重症监护数据进行回顾性观察性队列研究。
设置:欧洲和美国的 ICU。
患者:入住 ICU 至少 6 小时且数据质量良好的成年患者。
干预措施:无。
测量和主要结果:使用来自总共 334812 例 ICU 入住的数据进行仔细协调,我们系统地评估了 DL 模型在三种常见不良事件(死亡、急性肾损伤 (AKI) 和脓毒症)中的可转移性。我们测试了在训练过程中使用多个数据源和/或算法优化是否可以提高模型在新医院的性能。我们发现,在训练医院,模型在死亡率(0.838-0.869)、AKI(0.823-0.866)和脓毒症(0.749-0.824)方面的接受者操作特征曲线下面积(AUROC)很高。正如预期的那样,当模型应用于其他医院时,AUROC 会下降,有时下降幅度高达 -0.200。使用多个数据集进行训练可以减轻性能下降,多中心模型的性能大致与最佳单中心模型相当。在我们的实验中,专门用于提高通用性的方法并没有明显提高性能。
结论:我们的结果强调了用于 DL 基于风险预测的多样化训练数据的重要性。它们表明,随着更多医院的数据可用于训练,模型可能会变得越来越具有通用性。即便如此,在新医院的良好表现仍然取决于在训练过程中包含兼容的医院。
BMC Med Inform Decis Mak. 2020-10-2
Int J Med Inform. 2019-2-12