Sun Virginia H, Heemelaar Julius C, Hadzic Ibrahim, Raghu Vineet K, Wu Chia-Yun, Zubiri Leyre, Ghamari Azin, LeBoeuf Nicole R, Abu-Shawer Osama, Kehl Kenneth L, Grover Shilpa, Singh Prabhsimranjot, Suero-Abreu Giselle A, Wu Jessica, Falade Ayo S, Grealish Kelley, Thomas Molly F, Hathaway Nora, Medoff Benjamin D, Gilman Hannah K, Villani Alexandra-Chloe, Ho Jor Sam, Mooradian Meghan J, Sise Meghan E, Zlotoff Daniel A, Blum Steven M, Dougan Michael, Sullivan Ryan J, Neilan Tomas G, Reynolds Kerry L
Harvard Medical School, Boston, MA.
Cardiovascular Imaging Research Center, Massachusetts General Hospital, Boston, MA.
J Clin Oncol. 2024 Dec 10;42(35):4134-4144. doi: 10.1200/JCO.24.00326. Epub 2024 Sep 3.
Current approaches to accurately identify immune-related adverse events (irAEs) in large retrospective studies are limited. Large language models (LLMs) offer a potential solution to this challenge, given their high performance in natural language comprehension tasks. Therefore, we investigated the use of an LLM to identify irAEs among hospitalized patients, comparing its performance with manual adjudication and International Classification of Disease (ICD) codes.
Hospital admissions of patients receiving immune checkpoint inhibitor (ICI) therapy at a single institution from February 5, 2011, to September 5, 2023, were individually reviewed and adjudicated for the presence of irAEs. ICD codes and an LLM with retrieval-augmented generation were applied to detect frequent irAEs (ICI-induced colitis, hepatitis, and pneumonitis) and the most fatal irAE (ICI-myocarditis) from electronic health records. The performance between ICD codes and LLM was compared via sensitivity and specificity with an α = .05, relative to the gold standard of manual adjudication. External validation was performed using a data set of hospital admissions from June 1, 2018, to May 31, 2019, from a second institution.
Of the 7,555 admissions for patients on ICI therapy in the initial cohort, 2.0% were adjudicated to be due to ICI-colitis, 1.1% ICI-hepatitis, 0.7% ICI-pneumonitis, and 0.8% ICI-myocarditis. The LLM demonstrated higher sensitivity than ICD codes (94.7% 68.7%), achieving significance for ICI-hepatitis ( < .001), myocarditis ( < .001), and pneumonitis ( = .003) while yielding similar specificities (93.7% 92.4%). The LLM spent an average of 9.53 seconds/chart in comparison with an estimated 15 minutes for adjudication. In the validation cohort (N = 1,270), the mean LLM sensitivity and specificity were 98.1% and 95.7%, respectively.
LLMs are a useful tool for the detection of irAEs, outperforming ICD codes in sensitivity and adjudication in efficiency.
在大型回顾性研究中,目前准确识别免疫相关不良事件(irAE)的方法有限。鉴于大语言模型(LLM)在自然语言理解任务中具有高性能,它们为这一挑战提供了一个潜在的解决方案。因此,我们研究了使用大语言模型在住院患者中识别免疫相关不良事件,并将其性能与人工判定和国际疾病分类(ICD)编码进行比较。
对2011年2月5日至2023年9月5日在单一机构接受免疫检查点抑制剂(ICI)治疗的患者的住院记录进行逐一审查,并判定是否存在免疫相关不良事件。应用ICD编码和具有检索增强生成功能的大语言模型从电子健康记录中检测常见的免疫相关不良事件(ICI诱导的结肠炎、肝炎和肺炎)以及最致命的免疫相关不良事件(ICI心肌炎)。相对于人工判定的金标准,通过敏感性和特异性比较ICD编码和大语言模型的性能,α = 0.05。使用来自第二个机构的2018年6月1日至2019年5月31日的住院数据集进行外部验证。
在初始队列中接受ICI治疗的7555例患者中,2.0%被判定为ICI结肠炎所致,1.1%为ICI肝炎,0.