Kotula Charles A, Martin Jennie, Carey Kyle A, Edelson Dana P, Dligach Dmitriy, Mayampurath Anoop, Afshar Majid, Churpek Matthew M
Department of Medicine, University of Wisconsin-Madison, 610 Walnut St, Madison, WI, 53792, United States, 1 608-262-9564.
Department of Medicine, University of Chicago, Chicago, IL, United States.
J Med Internet Res. 2025 Jun 11;27:e75340. doi: 10.2196/75340.
Implementing machine learning models to identify clinical deterioration in the wards is associated with decreased morbidity and mortality. However, these models have high false positive rates and only use structured data.
We aimed to compare models with and without information from clinical notes for predicting deterioration.
Adults admitted to the wards at the University of Chicago (development cohort) and University of Wisconsin-Madison (external validation cohort) were included. Predictors consisted of structured and unstructured variables extracted from notes as concept unique identifiers (CUIs). We parameterized CUIs in 5 ways: standard tokenization (ST), International Classification of Diseases rollup using tokenization (ICDR-T), International Classification of Diseases rollup using binary variables (ICDR-BV), concept unique identifiers as SapBERT embedding (SE), and concept unique identifier clustering using SapBERT embeddings (CC). Each parameterization method combined with structured data and each structured data-only method were compared for predicting intensive care unit transfer or death in the next 24 hours using deep recurrent neural networks.
The development (University of Chicago) cohort included 284,302 patients, while the external validation (University of Wisconsin-Madison) cohort included 248,055 patients. In total, 4.9% (n=26,281) of patients experienced the outcome. The SE model achieved the highest area under the precision-recall curve (0.208), followed by CC (0.199) and the structured-only model (0.199), ICDR-BV (0.194), ICDR-T (0.166), and ST (0.158). The CC and structured-only models achieved the highest area under the receiver operating characteristic (0.870), followed by ICDR-T (0.867), ICDR-BV (0.866), ST (0.860), and SE (0.859). Regarding sensitivity and positive predictive value, the CC model achieved the greatest positive predictive value (12.53%) and sensitivity (52.15%) at the cutoff that flagged 5% of the observations in the test set. At the 15% cutoff, the ICDR-T, CC, and ICDR-BV models tied for the highest positive predictive value at 5.67%, while their sensitivities were 70.95%, 70.92%, and 70.86%, respectively. All models were well calibrated, achieving Brier scores in the range of 0.011-0.012. The modified integrated gradients method revealed that CUIs corresponding to terms such as "NPO - nothing by mouth," "chemotherapy," "transplanted tissue," and "dialysis procedure" were most predictive of deterioration.
A multimodal model combining structured data with embeddings using SapBERT had the highest area under the precision-recall curve, but performance was similar between models with and without CUIs. Although the addition of CUIs from notes to structured data did not meaningfully improve model performance for predicting clinical deterioration, models using CUIs could provide clinicians with relevant information and additional clinical context for supporting decision-making.
在病房中应用机器学习模型来识别临床病情恶化与发病率和死亡率的降低相关。然而,这些模型的假阳性率很高,并且仅使用结构化数据。
我们旨在比较有无临床记录信息的模型在预测病情恶化方面的表现。
纳入芝加哥大学(开发队列)和威斯康星大学麦迪逊分校(外部验证队列)病房收治的成年患者。预测变量包括从病历中提取的结构化和非结构化变量,以概念唯一标识符(CUI)表示。我们以5种方式对CUI进行参数化:标准词元化(ST)、使用词元化的国际疾病分类汇总(ICDR-T)、使用二元变量的国际疾病分类汇总(ICDR-BV)、作为SapBERT嵌入的概念唯一标识符(SE)以及使用SapBERT嵌入的概念唯一标识符聚类(CC)。使用深度循环神经网络,比较每种参数化方法与结构化数据相结合的情况以及每种仅使用结构化数据的方法,以预测未来24小时内的重症监护病房转诊或死亡情况。
开发队列(芝加哥大学)包括284,302例患者,而外部验证队列(威斯康星大学麦迪逊分校)包括248,055例患者。共有4.9%(n = 26,281)的患者出现了该结局。SE模型在精确率-召回率曲线下面积最高(0.208),其次是CC(0.199)和仅使用结构化数据的模型(0.199)、ICDR-BV(0.194)、ICDR-T(0.166)和ST(0.158)。CC模型和仅使用结构化数据的模型在受试者工作特征曲线下面积最高(0.870),其次是ICDR-T(0.867)、ICDR-BV(0.866)、ST(0.860)和SE(0.859)。关于敏感性和阳性预测值,在标记测试集中5%观察值的截断值时,CC模型的阳性预测值最高(12.53%)且敏感性最高(52.15%)。在15%的截断值时,ICDR-T、CC和ICDR-BV模型的阳性预测值并列最高,为5.67%,而它们的敏感性分别为70.95%、70.92%和70.86%。所有模型校准良好,布里尔分数在0.011 - 0.012范围内。改进的综合梯度法显示,与“禁食 - 无口进食”“化疗”“移植组织”和“透析程序”等术语对应的CUI对病情恶化的预测性最强。
将结构化数据与使用SapBERT的嵌入相结合的多模态模型在精确率-召回率曲线下面积最高,但有无CUI的模型之间性能相似。虽然将病历中的CUI添加到结构化数据中并不能显著提高预测临床病情恶化的模型性能,但使用CUI的模型可为临床医生提供相关信息和额外的临床背景以支持决策。