• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用大语言模型从非结构化临床信件中提取癫痫相关信息。

Extracting epilepsy-related information from unstructured clinic letters using large language models.

作者信息

Fang Shichao, Holgate Ben, Shek Anthony, Winston Joel S, McWilliam Matthew, Viana Pedro F, Teo James T, Richardson Mark P

机构信息

Department of Basic & Clinical Neuroscience, King's College London, London, UK.

King's College Hospital NHS Foundation Trust, London, UK.

出版信息

Epilepsia. 2025 Jul 10. doi: 10.1111/epi.18475.

DOI:10.1111/epi.18475
PMID:40637590
Abstract

OBJECTIVE

The emergence of large language models (LLMs) and the increasing prevalence of electronic health records (EHRs) present significant opportunities for advancing health care research and practice. However, research that compares and applies LLMs to extract key epilepsy-related information from unstructured medical free text is under-explored. This study fills this gap by comparing and applying different open-source LLMs and methods to extract epilepsy information from unstructured clinic letters, thereby optimizing EHRs as a resource for the benefit of epilepsy research. We also highlight some limitations of LLMs.

METHODS

Employing a dataset of 280 annotated clinic letters from King's College Hospital, we explored the efficacy of open-source LLMs (Llama and Mistral series) for extracting key epilepsy-related information, including epilepsy type, seizure type, current anti-seizure medications (ASMs), and associated symptoms. The study used various extraction methods, including direct extraction, summarized extraction, and contextualized extraction, complemented by role-prompting and few-shot prompting techniques. Performance was evaluated against a gold standard dataset, and was also compared to advanced fine-tuned models and human annotations.

RESULTS

Llama 2 13b (a 13-billion-parameter LLM developed by Meta) demonstrated superior extraction capabilities across tasks by consistently outperforming other LLMs (F1 = .80 in epilepsy-type extraction, F1 = .76 in seizure-type extraction, and F1 = .90 in current ASMs extraction). Here, F1 score is a balanced metric indicating the model's accuracy in correctly identifying relevant information without excessive false positives. The study highlights the direct extraction showing consistent high performance. Comparative analysis showed that LLMs outperformed current approaches like MedCAT (Medical Concept Annotation Tool) in extracting epilepsy-related information (.2 higher in F1).

SIGNIFICANCE

The results affirm the potential of LLMs in medical information extraction relating to epilepsy, offering insights into leveraging these models for detailed and accurate data extraction from unstructured texts. The study underscores the importance of method selection in optimizing extraction performance and suggests a promising avenue for enhancing medical research and patient care through advanced natural language processing technologies.

摘要

目的

大语言模型(LLMs)的出现以及电子健康记录(EHRs)的日益普及为推进医疗保健研究和实践带来了重大机遇。然而,比较和应用大语言模型从未结构化的医学自由文本中提取关键癫痫相关信息的研究尚未得到充分探索。本研究通过比较和应用不同的开源大语言模型及方法,从非结构化的临床信件中提取癫痫信息,填补了这一空白,从而优化电子健康记录作为一种资源,以造福癫痫研究。我们还强调了大语言模型的一些局限性。

方法

利用来自国王学院医院的280封带注释的临床信件数据集,我们探索了开源大语言模型(Llama和Mistral系列)提取关键癫痫相关信息的功效,包括癫痫类型、发作类型、当前的抗癫痫药物(ASMs)以及相关症状。该研究使用了各种提取方法,包括直接提取、汇总提取和情境化提取,并辅以角色提示和少样本提示技术。性能根据金标准数据集进行评估,并且还与先进的微调模型和人工注释进行了比较。

结果

Llama 2 13b(Meta开发的一个拥有130亿参数的大语言模型)在各项任务中均表现出卓越的提取能力,始终优于其他大语言模型(癫痫类型提取的F1值为0.80,发作类型提取的F1值为0.76,当前抗癫痫药物提取的F1值为0.90)。在此,F1分数是一个平衡指标,表明模型在正确识别相关信息时不会出现过多误报的准确性。该研究突出了直接提取始终具有高性能。对比分析表明,在提取癫痫相关信息方面,大语言模型优于当前的方法,如MedCAT(医学概念注释工具)(F1值高0.2)。

意义

研究结果证实了大语言模型在癫痫相关医学信息提取中的潜力,为利用这些模型从非结构化文本中进行详细准确的数据提取提供了见解。该研究强调了方法选择在优化提取性能中的重要性,并为通过先进的自然语言处理技术加强医学研究和患者护理提出了一条有前景的途径。

相似文献

1
Extracting epilepsy-related information from unstructured clinic letters using large language models.使用大语言模型从非结构化临床信件中提取癫痫相关信息。
Epilepsia. 2025 Jul 10. doi: 10.1111/epi.18475.
2
Harnessing Moderate-Sized Language Models for Reliable Patient Data Deidentification in Emergency Department Records: Algorithm Development, Validation, and Implementation Study.利用中等规模语言模型对急诊科记录中的患者数据进行可靠去识别:算法开发、验证与实施研究。
JMIR AI. 2025 Apr 1;4:e57828. doi: 10.2196/57828.
3
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
4
Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.使用具有特征总结和混合检索增强生成功能的大语言模型增强肺部疾病预测:基于放射学报告的多中心方法学研究
J Med Internet Res. 2025 Jun 11;27:e72638. doi: 10.2196/72638.
5
Utilizing large language models for detecting hospital-acquired conditions: an empirical study on pulmonary embolism.利用大语言模型检测医院获得性疾病:关于肺栓塞的实证研究
J Am Med Inform Assoc. 2025 May 1;32(5):876-884. doi: 10.1093/jamia/ocaf048.
6
Data extraction from free-text stroke CT reports using GPT-4o and Llama-3.3-70B: the impact of annotation guidelines.使用GPT-4o和Llama-3.3-70B从自由文本中风CT报告中提取数据:注释指南的影响
Eur Radiol Exp. 2025 Jun 19;9(1):61. doi: 10.1186/s41747-025-00600-2.
7
Relation extraction using large language models: a case study on acupuncture point locations.基于大语言模型的关系抽取研究:以穴位定位为例。
J Am Med Inform Assoc. 2024 Nov 1;31(11):2622-2631. doi: 10.1093/jamia/ocae233.
8
From text to data: Open-source large language models in extracting cancer related medical attributes from German pathology reports.从文本到数据:用于从德语病理报告中提取癌症相关医学属性的开源大语言模型
Int J Med Inform. 2025 Nov;203:106022. doi: 10.1016/j.ijmedinf.2025.106022. Epub 2025 Jul 2.
9
Predicting 30-Day Postoperative Mortality and American Society of Anesthesiologists Physical Status Using Retrieval-Augmented Large Language Models: Development and Validation Study.使用检索增强大语言模型预测术后30天死亡率和美国麻醉医师协会身体状况:开发与验证研究
J Med Internet Res. 2025 Jun 3;27:e75052. doi: 10.2196/75052.
10
Language Models for Multilabel Document Classification of Surgical Concepts in Exploratory Laparotomy Operative Notes: Algorithm Development Study.用于探索性剖腹手术记录中手术概念多标签文档分类的语言模型:算法开发研究
JMIR Med Inform. 2025 Jul 9;13:e71176. doi: 10.2196/71176.