Van Phan Hoang, Spottiswoode Natasha, Lydon Emily C, Chu Victoria T, Cuesta Adolfo, Kazberouk Alexander D, Richmond Natalie L, Deosthale Padmini, Calfee Carolyn S, Langelier Charles R
Department of Medicine, Division of Infectious Diseases, University of California San Francisco.
Department of Pediatrics, Division of Infectious Diseases and Global Health, University of California San Francisco.
medRxiv. 2025 Apr 3:2024.08.28.24312732. doi: 10.1101/2024.08.28.24312732.
Lower respiratory tract infections (LRTIs) are a leading cause of mortality worldwide and can be difficult to diagnose in critically ill patients, as non-infectious causes of respiratory failure can present with similar clinical features.
We developed a LRTI diagnostic method combining the pulmonary transcriptomic biomarker with electronic medical record (EMR) text assessment using the large language model Generative Pre-trained Transformer 4 (GPT-4). We evaluated this approach in a prospective cohort of critically ill adults with acute respiratory failure from whom tracheal aspirate expression was measured by RNA sequencing. Patients with LRTI or non-infectious conditions were identified using retrospective, multi-physician clinical adjudication. We then confirmed our findings by applying this method to an independent validation cohort of 115 adults with acute respiratory failure.
In the derivation cohort, a combined classifier incorporating expression and GPT-4-assisted EMR analysis achieved an AUC of 0.93 (±0.08) and an accuracy of 84%, outperforming expression alone (AUC 0.84 ± 0.11) and GPT-4-based analysis alone (AUC 0.83 ± 0.07). By comparison, the primary medical team's admission diagnosis had an accuracy of 72%. In the validation cohort, the combined classifier yielded an AUC of 0.98 (±0.04) and an accuracy of 96%.
Integrating a host transcriptional biomarker with EMR text analysis using a large language model may offer a promising new approach to improving the diagnosis of LRTIs in critically ill adults.
下呼吸道感染(LRTIs)是全球范围内主要的死亡原因之一,在重症患者中可能难以诊断,因为呼吸衰竭的非感染性病因可能表现出相似的临床特征。
我们开发了一种下呼吸道感染诊断方法,该方法将肺部转录组生物标志物与使用大语言模型生成式预训练变换器4(GPT-4)的电子病历(EMR)文本评估相结合。我们在一组患有急性呼吸衰竭的重症成年患者的前瞻性队列中评估了这种方法,通过RNA测序测量了这些患者的气管吸出物表达。使用回顾性、多医生临床判定来识别患有下呼吸道感染或非感染性疾病的患者。然后,我们将此方法应用于115名患有急性呼吸衰竭的成年患者的独立验证队列,以证实我们的发现。
在推导队列中,结合表达和GPT-4辅助EMR分析的联合分类器的曲线下面积(AUC)为0.93(±0.08),准确率为84%,优于单独的表达(AUC 0.84±0.11)和单独的基于GPT-4的分析(AUC 0.83±0.07)。相比之下,初级医疗团队的入院诊断准确率为72%。在验证队列中,联合分类器的AUC为0.98(±0.04),准确率为96%。
将宿主转录生物标志物与使用大语言模型的EMR文本分析相结合,可能为改善重症成年患者下呼吸道感染的诊断提供一种有前景的新方法。