Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, ON M5G1X8, Canada.
Stanford Center for Biomedical Informatics Research, Stanford University, Palo Alto, CA 94305, United States.
J Am Med Inform Assoc. 2023 Nov 17;30(12):2004-2011. doi: 10.1093/jamia/ocad175.
Development of electronic health records (EHR)-based machine learning models for pediatric inpatients is challenged by limited training data. Self-supervised learning using adult data may be a promising approach to creating robust pediatric prediction models. The primary objective was to determine whether a self-supervised model trained in adult inpatients was noninferior to logistic regression models trained in pediatric inpatients, for pediatric inpatient clinical prediction tasks.
This retrospective cohort study used EHR data and included patients with at least one admission to an inpatient unit. One admission per patient was randomly selected. Adult inpatients were 18 years or older while pediatric inpatients were more than 28 days and less than 18 years. Admissions were temporally split into training (January 1, 2008 to December 31, 2019), validation (January 1, 2020 to December 31, 2020), and test (January 1, 2021 to August 1, 2022) sets. Primary comparison was a self-supervised model trained in adult inpatients versus count-based logistic regression models trained in pediatric inpatients. Primary outcome was mean area-under-the-receiver-operating-characteristic-curve (AUROC) for 11 distinct clinical outcomes. Models were evaluated in pediatric inpatients.
When evaluated in pediatric inpatients, mean AUROC of self-supervised model trained in adult inpatients (0.902) was noninferior to count-based logistic regression models trained in pediatric inpatients (0.868) (mean difference = 0.034, 95% CI=0.014-0.057; P < .001 for noninferiority and P = .006 for superiority).
Self-supervised learning in adult inpatients was noninferior to logistic regression models trained in pediatric inpatients. This finding suggests transferability of self-supervised models trained in adult patients to pediatric patients, without requiring costly model retraining.
电子病历(EHR)为基础的机器学习模型用于儿科住院患者,受到训练数据有限的挑战。使用成人数据进行自我监督学习可能是创建强大儿科预测模型的一种有前途的方法。主要目的是确定在成人住院患者中训练的自我监督模型是否不劣于在儿科住院患者中训练的逻辑回归模型,用于儿科住院患者的临床预测任务。
这项回顾性队列研究使用电子病历数据,包括至少有一次住院的患者。每位患者随机选择一次住院。成人住院患者为 18 岁或以上,而儿科住院患者为 28 天以上且 18 岁以下。入院时间分为训练(2008 年 1 月 1 日至 2019 年 12 月 31 日)、验证(2020 年 1 月 1 日至 2020 年 12 月 31 日)和测试(2021 年 1 月 1 日至 2022 年 8 月 1 日)组。主要比较是在成人住院患者中训练的自我监督模型与在儿科住院患者中训练的基于计数的逻辑回归模型。主要结果是 11 种不同临床结果的平均接收者操作特征曲线下面积(AUROC)。在儿科住院患者中评估模型。
在儿科住院患者中,在成人住院患者中训练的自我监督模型的平均 AUROC(0.902)不劣于在儿科住院患者中训练的基于计数的逻辑回归模型(0.868)(平均差异=0.034,95%CI=0.014-0.057;非劣效性 P<.001,优势性 P=.006)。
成人住院患者的自我监督学习不劣于儿科住院患者中训练的逻辑回归模型。这一发现表明,无需昂贵的模型重新训练,即可将在成人患者中训练的自我监督模型转移到儿科患者中。