Chen Bing Yu, Antaki Fares, Gonzalez Marco, Uchino Ken, Albahra Samer, Robertson Scott, Ibrikji Sidonie, Aube Eric, Russman Andrew, Hussain Muhammad Shazam
Neurological Institute, Cleveland Clinic, Cleveland, Ohio, USA.
Cole Eye Institute, Cleveland Clinic, Cleveland, Ohio, USA.
Cerebrovasc Dis Extra. 2025;15(1):130-136. doi: 10.1159/000545317. Epub 2025 Mar 17.
Timely thrombolytic therapy improves outcomes in acute ischemic stroke. Manual chart review to screen for thrombolysis contraindications may be time-consuming and prone to errors. We developed and tested a large language model (LLM)-based tool to identify thrombolysis contraindications from clinical notes using synthetic data in a proof-of-concept study.
We generated 150 synthetic clinical notes containing randomly assigned thrombolysis contraindications using LLMs. We then used Llama 3.1 405B with a custom prompt to generate a list of thrombolysis contraindications from each note. Performance was evaluated using sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and F1 score.
A total of 150 synthetic notes were generated using five different models: ChatGPT-4o, Llama 3.1 405B, Llama 3.1 70B, ChatGPT-4o mini, and Gemini 1.5 Flash. On average, each note contained 241.6 words (SD 110.7; range 80-549) and included 1.5 contraindications (SD 1.1; range 0-5). Our tool achieved a sensitivity of 90.9% (95% CI: 86.3%-94.3%), specificity of 99.2% (95% CI: 98.8%-99.5%), PPV of 87.7% (95% CI: 82.7%-91.7%), NPV of 99.4% (95% CI: 99.1%-99.6%), accuracy of 98.7% (95% CI: 98.2%-99.0%), and an F1 score of 0.892. Among the false positives, 24 (86%) were due to the inclusion of irrelevant contraindications, and 4 (14%) resulted from repetitive information. No hallucinations were observed.
Our LLM-based tool may identify stroke thrombolysis contraindications from synthetic clinical notes with high sensitivity and PPV. Future studies will validate its performance using real EMR data and integrate it into acute stroke workflows to facilitate faster and safer thrombolysis decision-making.
及时进行溶栓治疗可改善急性缺血性卒中的预后。通过人工查阅病历以筛查溶栓禁忌证可能耗时且容易出错。在一项概念验证研究中,我们开发并测试了一种基于大语言模型(LLM)的工具,该工具使用合成数据从临床记录中识别溶栓禁忌证。
我们使用大语言模型生成了150份包含随机分配的溶栓禁忌证的合成临床记录。然后,我们使用带有自定义提示的Llama 3.1 405B从每份记录中生成一份溶栓禁忌证列表。使用灵敏度、特异度、阳性预测值(PPV)、阴性预测值(NPV)、准确度和F1分数评估性能。
使用五种不同模型共生成了150份合成记录:ChatGPT-4o、Llama 3.1 405B、Llama 3.1 70B、ChatGPT-4o mini和Gemini 1.5 Flash。平均而言,每份记录包含241.6个单词(标准差110.7;范围80 - 549),并包含1.5个禁忌证(标准差1.1;范围0 - 5)。我们的工具灵敏度达到90.9%(95%置信区间:86.3% - 94.3%),特异度为99.2%(95%置信区间:98.8% - 99.5%),PPV为87.7%(95%置信区间:82.7% - 91.7%),NPV为99.4%(95%置信区间:99.1% - 99.6%),准确度为98.7%(95%置信区间:98.2% - 99.0%)以及F1分数为0.892。在假阳性结果中,24例(86%)是由于包含了不相关的禁忌证,4例(14%)是由于重复信息导致的。未观察到幻觉现象。
我们基于大语言模型的工具可能从合成临床记录中以高灵敏度和PPV识别卒中溶栓禁忌证。未来的研究将使用真实电子病历数据验证其性能,并将其整合到急性卒中工作流程中,以促进更快、更安全的溶栓决策。