Suppr超能文献

大语言模型在非结构化电子健康记录数据中识别中风亚型的准确性。

Accuracy of Large Language Models to Identify Stroke Subtypes Within Unstructured Electronic Health Record Data.

作者信息

Owens Dylan, Nguyen Danh Q, Dohopolski Michael, Rousseau Justin F, Peterson Eric D, Navar Ann Marie

机构信息

Department of Medicine, UT Southwestern Medical Center, Dallas, TX. (D.O., D.Q.N., E.D.P., A.M.N.).

Department of Radiation Oncology, UT Southwestern Medical Center, Dallas, TX. (M.D.).

出版信息

Stroke. 2025 Jul 25. doi: 10.1161/STROKEAHA.125.051993.

Abstract

BACKGROUND

While codes suffice for identifying stroke events in surveillance, accurately classifying stroke types and subtypes using electronic health records remains challenging due to limitations in structured data. This often necessitates manual review of clinical documentation. This study evaluated whether a large language model, GPT-4o, can accurately identify stroke types and subtypes from unstructured clinical notes.

METHODS

We implemented a retrieval-augmented generation framework with GPT-4o to classify stroke types (ischemic versus hemorrhagic) and ischemic stroke subtypes using electronic health records data. The American Heart Association Get With The Guidelines-Stroke registry served as the gold standard. Model development used a 20% subset of Get With The Guidelines-Stroke-linked data from UT Southwestern Medical Center, with the remaining 80% reserved for testing. External validation used data from the Parkland Health and Hospital System. A total of 4123 stroke hospitalizations from January 2019 to August 2023 were included (UT Southwestern Medical Center: n=2047; Parkland Health and Hospital System: n=2076). Three prompting strategies-zero-shot Chain-of-Thought, expert-guided, and instruction-based-were evaluated. Predictions of GPT-4os were compared with classifications made by trained abstractors contributing to the Get With The Guidelines-Stroke registry.

RESULTS

In the external validation set, 79.6% of patients had ischemic stroke and 20.4% hemorrhagic. GPT-4o achieved 98% accuracy (95% CI, 0.97-0.99) in classifying stroke type, where accuracy reflects the overall proportion of correctly classified patients. Sensitivity was 0.98 (95% CI, 0.97-0.99), and specificity was 0.97 (95% CI, 0.96-0.98). For ischemic stroke subtypes, sensitivity ranged from 0.40 (95% CI, 0.31-0.49) for cryptogenic to 0.95 (95% CI, 0.93-0.97) for small-vessel occlusion. Specificity ranged from 0.94 (95% CI, 0.92-0.96) for large-artery atherosclerosis to 0.98 (95% CI, 0.97-0.99) for cardioembolism. Zero-shot Chain-of-Thought prompting-requiring minimal human input-performed comparably to more labor-intensive strategies. Consistency analysis revealed 99% agreement across repeated queries.

CONCLUSIONS

GPT-4o demonstrated strong accuracy in classifying stroke types but faced challenges with ischemic subtypes.

摘要

背景

虽然编码足以在监测中识别中风事件,但由于结构化数据的局限性,使用电子健康记录准确分类中风类型和亚型仍然具有挑战性。这通常需要人工审查临床文档。本研究评估了大语言模型GPT-4o能否从未结构化的临床记录中准确识别中风类型和亚型。

方法

我们使用GPT-4o实施了一个检索增强生成框架,以使用电子健康记录数据对中风类型(缺血性与出血性)和缺血性中风亚型进行分类。美国心脏协会“遵循指南-中风”注册中心作为金标准。模型开发使用了来自德克萨斯大学西南医学中心与“遵循指南-中风”相关数据的20%子集,其余80%留作测试。外部验证使用了帕克兰健康与医院系统的数据。纳入了2019年1月至2023年8月期间的4123例中风住院病例(德克萨斯大学西南医学中心:n = 2047;帕克兰健康与医院系统:n = 2076)。评估了三种提示策略——零样本思维链、专家指导和基于指令的策略。将GPT-4o的预测与为“遵循指南-中风”注册中心做出贡献的经过培训的摘要撰写人员的分类进行了比较。

结果

在外部验证集中,79.6%的患者为缺血性中风,20.4%为出血性中风。GPT-4o在中风类型分类中达到了98%的准确率(95% CI,0.97 - 0.99),其中准确率反映了正确分类患者的总体比例。敏感性为0.98(95% CI,0.97 - 0.99),特异性为0.97(95% CI,0.96 - 0.98)。对于缺血性中风亚型,敏感性范围从隐源性的0.40(95% CI,0.31 - 0.49)到小血管闭塞的0.95(95% CI,0.93 - 0.97)。特异性范围从大动脉粥样硬化的0.94(95% CI,0.92 - 0.96)到心源性栓塞的0.98(95% CI,0.97 - 0.99)。零样本思维链提示——需要最少的人工输入——与劳动强度更大的策略表现相当。一致性分析显示,重复查询的一致性为99%。

结论

GPT-4o在中风类型分类中表现出很高的准确率,但在缺血性亚型方面面临挑战。

相似文献

本文引用的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验