Levy Elizabeth, Claar Dru, Co Ivan, Fuchs Barry D, Ginestra Jennifer, Kohn Rachel, McSparron Jakob I, Patel Bhavik, Weissman Gary E, Kerlin Meeta Prasad, Sjoding Michael W
Division of Pulmonary, Allergy and Critical Care, Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA.
Palliative and Advanced Illness (PAIR) Center, Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA.
Crit Care Med. 2025 Jun 1;53(6):e1224-e1234. doi: 10.1097/CCM.0000000000006662. Epub 2025 Apr 8.
The aim of this study was to develop and externally validate a machine-learning model that retrospectively identifies patients with acute respiratory distress syndrome (acute respiratory distress syndrome [ARDS]) using electronic health record (EHR) data.
In this retrospective cohort study, ARDS was identified via physician-adjudication in three cohorts of patients with hypoxemic respiratory failure (training, internal validation, and external validation). Machine-learning models were trained to classify ARDS using vital signs, respiratory support, laboratory data, medications, chest radiology reports, and clinical notes. The best-performing models were assessed and internally and externally validated using the area under receiver-operating curve (AUROC), area under precision-recall curve, integrated calibration index (ICI), sensitivity, specificity, positive predictive value (PPV), and ARDS timing.
Patients with hypoxemic respiratory failure undergoing mechanical ventilation within two distinct health systems.
None.
There were 1,845 patients in the training cohort, 556 in the internal validation cohort, and 199 in the external validation cohort. ARDS prevalence was 19%, 17%, and 31%, respectively. Regularized logistic regression models analyzing structured data (EHR model) and structured data and radiology reports (EHR-radiology model) had the best performance. During internal and external validation, the EHR-radiology model had AUROC of 0.91 (95% CI, 0.88-0.93) and 0.88 (95% CI, 0.87-0.93), respectively. Externally, the ICI was 0.13 (95% CI, 0.08-0.18). At a specified model threshold, sensitivity and specificity were 80% (95% CI, 75%-98%), PPV was 64% (95% CI, 58%-71%), and the model identified patients with a median of 2.2 hours (interquartile range 0.2-18.6) after meeting Berlin ARDS criteria.
Machine-learning models analyzing EHR data can retrospectively identify patients with ARDS across different institutions.
本研究旨在开发并外部验证一种机器学习模型,该模型使用电子健康记录(EHR)数据回顾性识别急性呼吸窘迫综合征(ARDS)患者。
在这项回顾性队列研究中,通过医生判定在三组低氧性呼吸衰竭患者(训练组、内部验证组和外部验证组)中识别ARDS。使用生命体征、呼吸支持、实验室数据、药物、胸部放射学报告和临床记录训练机器学习模型以对ARDS进行分类。使用受试者操作特征曲线下面积(AUROC)、精确召回率曲线下面积、综合校准指数(ICI)、敏感性、特异性、阳性预测值(PPV)和ARDS发生时间对表现最佳的模型进行评估并进行内部和外部验证。
在两个不同医疗系统中接受机械通气的低氧性呼吸衰竭患者。
无。
训练组有1845例患者,内部验证组有556例,外部验证组有199例。ARDS患病率分别为19%、17%和31%。分析结构化数据(EHR模型)以及结构化数据和放射学报告(EHR-放射学模型)的正则化逻辑回归模型表现最佳。在内部和外部验证期间,EHR-放射学模型的AUROC分别为0.91(95%CI,0.88-0.93)和0.88(95%CI,0.87-0.93)。在外部,ICI为0.13(95%CI,0.08-0.18)。在指定的模型阈值下,敏感性和特异性为80%(95%CI,75%-98%),PPV为64%(95%CI,58%-71%),该模型在符合柏林ARDS标准后中位数2.2小时(四分位间距0.2-18.6)识别出患者。
分析EHR数据的机器学习模型可以回顾性识别不同机构中的ARDS患者。