• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于肿瘤学健康信息提取的大语言模型应用:范围综述

Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.

作者信息

Chen David, Alnassar Saif Addeen, Avison Kate Elizabeth, Huang Ryan S, Raman Srinivas

机构信息

Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada.

Department of Systems Design Engineering, University of Waterloo, Waterloo, ON, Canada.

出版信息

JMIR Cancer. 2025 Mar 28;11:e65984. doi: 10.2196/65984.

DOI:10.2196/65984
PMID:40153782
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11970800/
Abstract

BACKGROUND

Natural language processing systems for data extraction from unstructured clinical text require expert-driven input for labeled annotations and model training. The natural language processing competency of large language models (LLM) can enable automated data extraction of important patient characteristics from electronic health records, which is useful for accelerating cancer clinical research and informing oncology care.

OBJECTIVE

This scoping review aims to map the current landscape, including definitions, frameworks, and future directions of LLMs applied to data extraction from clinical text in oncology.

METHODS

We queried Ovid MEDLINE for primary, peer-reviewed research studies published since 2000 on June 2, 2024, using oncology- and LLM-related keywords. This scoping review included studies that evaluated the performance of an LLM applied to data extraction from clinical text in oncology contexts. Study attributes and main outcomes were extracted to outline key trends of research in LLM-based data extraction.

RESULTS

The literature search yielded 24 studies for inclusion. The majority of studies assessed original and fine-tuned variants of the BERT LLM (n=18, 75%) followed by the Chat-GPT conversational LLM (n=6, 25%). LLMs for data extraction were commonly applied in pan-cancer clinical settings (n=11, 46%), followed by breast (n=4, 17%), and lung (n=4, 17%) cancer contexts, and were evaluated using multi-institution datasets (n=18, 75%). Comparing the studies published in 2022-2024 versus 2019-2021, both the total number of studies (18 vs 6) and the proportion of studies using prompt engineering increased (5/18, 28% vs 0/6, 0%), while the proportion using fine-tuning decreased (8/18, 44.4% vs 6/6, 100%). Advantages of LLMs included positive data extraction performance and reduced manual workload.

CONCLUSIONS

LLMs applied to data extraction in oncology can serve as useful automated tools to reduce the administrative burden of reviewing patient health records and increase time for patient-facing care. Recent advances in prompt-engineering and fine-tuning methods, and multimodal data extraction present promising directions for future research. Further studies are needed to evaluate the performance of LLM-enabled data extraction in clinical domains beyond the training dataset and to assess the scope and integration of LLMs into real-world clinical environments.

摘要

背景

用于从非结构化临床文本中提取数据的自然语言处理系统需要专家驱动的输入来进行标注和模型训练。大语言模型(LLM)的自然语言处理能力能够从电子健康记录中自动提取重要的患者特征,这有助于加速癌症临床研究并为肿瘤护理提供信息。

目的

本综述旨在梳理当前应用于肿瘤学临床文本数据提取的大语言模型的现状,包括定义、框架和未来方向。

方法

我们于2024年6月2日在Ovid MEDLINE数据库中检索自2000年以来发表的经同行评审的原发性研究,使用与肿瘤学和大语言模型相关的关键词。本综述纳入了评估大语言模型在肿瘤学背景下从临床文本中提取数据的性能的研究。提取研究属性和主要结果以概述基于大语言模型的数据提取研究的关键趋势。

结果

文献检索得到24项纳入研究。大多数研究评估了BERT大语言模型的原始版本和微调版本(n = 18,75%),其次是Chat-GPT对话式大语言模型(n = 6,25%)。用于数据提取的大语言模型通常应用于泛癌临床环境(n = 11,46%),其次是乳腺癌(n = 4,17%)和肺癌(n = 4,17%)环境,并使用多机构数据集进行评估(n = 18,75%)。比较2022 - 2024年与2019 - 2021年发表的研究,研究总数(18项对6项)和使用提示工程的研究比例均有所增加(5/18,28%对0/6,0%),而使用微调的比例下降(8/18,44.4%对6/6,100%)。大语言模型的优点包括积极的数据提取性能和减少人工工作量。

结论

应用于肿瘤学数据提取的大语言模型可以作为有用的自动化工具,减轻审查患者健康记录的管理负担,并增加用于面向患者护理的时间。提示工程和微调方法以及多模态数据提取方面的最新进展为未来研究提供了有前景的方向。需要进一步研究来评估在训练数据集之外的临床领域中基于大语言模型的数据提取性能,并评估大语言模型在实际临床环境中的应用范围和整合情况。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc04/11970800/af3496dd4b0b/cancer-v11-e65984-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc04/11970800/b8c05b9aa1e2/cancer-v11-e65984-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc04/11970800/5d6b1ffd081e/cancer-v11-e65984-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc04/11970800/af3496dd4b0b/cancer-v11-e65984-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc04/11970800/b8c05b9aa1e2/cancer-v11-e65984-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc04/11970800/5d6b1ffd081e/cancer-v11-e65984-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc04/11970800/af3496dd4b0b/cancer-v11-e65984-g003.jpg

相似文献

1
Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.用于肿瘤学健康信息提取的大语言模型应用:范围综述
JMIR Cancer. 2025 Mar 28;11:e65984. doi: 10.2196/65984.
2
Medical accuracy of artificial intelligence chatbots in oncology: a scoping review.人工智能聊天机器人在肿瘤学中的医学准确性:一项范围综述。
Oncologist. 2025 Apr 4;30(4). doi: 10.1093/oncolo/oyaf038.
3
Using Large Language Models to Automate Data Extraction From Surgical Pathology Reports: Retrospective Cohort Study.使用大语言模型自动从外科病理报告中提取数据:回顾性队列研究。
JMIR Form Res. 2025 Apr 7;9:e64544. doi: 10.2196/64544.
4
Scalable information extraction from free text electronic health records using large language models.使用大语言模型从自由文本电子健康记录中进行可扩展的信息提取。
BMC Med Res Methodol. 2025 Jan 28;25(1):23. doi: 10.1186/s12874-025-02470-z.
5
Large language models for data extraction from unstructured and semi-structured electronic health records: a multiple model performance evaluation.用于从非结构化和半结构化电子健康记录中提取数据的大语言模型:多模型性能评估
BMJ Health Care Inform. 2025 Jan 19;32(1):e101139. doi: 10.1136/bmjhci-2024-101139.
6
The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review.大型语言模型在变革急诊医学中的作用:范围综述
JMIR Med Inform. 2024 May 10;12:e53787. doi: 10.2196/53787.
7
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
8
Large language models for conducting systematic reviews: on the rise, but not yet ready for use-a scoping review.用于进行系统评价的大型语言模型:正在兴起,但尚未准备好投入使用——一项范围综述
J Clin Epidemiol. 2025 May;181:111746. doi: 10.1016/j.jclinepi.2025.111746. Epub 2025 Feb 26.
9
Using Synthetic Health Care Data to Leverage Large Language Models for Named Entity Recognition: Development and Validation Study.利用合成医疗保健数据借助大语言模型进行命名实体识别:开发与验证研究。
J Med Internet Res. 2025 Mar 18;27:e66279. doi: 10.2196/66279.
10
Automated Radiology Report Labeling in Chest X-Ray Pathologies: Development and Evaluation of a Large Language Model Framework.胸部X光病理学中的自动放射学报告标注:大语言模型框架的开发与评估
JMIR Med Inform. 2025 Mar 28;13:e68618. doi: 10.2196/68618.

引用本文的文献

1
Addressing the challenges of field notes in medical education: a qualitative study of resident experiences.应对医学教育中实地记录的挑战:一项关于住院医师经历的定性研究
BMC Med Educ. 2025 Jul 1;25(1):883. doi: 10.1186/s12909-025-07578-w.

本文引用的文献

1
From text to insight: large language models for chemical data extraction.从文本到洞察:用于化学数据提取的大语言模型
Chem Soc Rev. 2025 Feb 3;54(3):1125-1150. doi: 10.1039/d4cs00913d.
2
Performance of Multimodal Artificial Intelligence Chatbots Evaluated on Clinical Oncology Cases.多模态人工智能聊天机器人在临床肿瘤病例中的性能评估。
JAMA Netw Open. 2024 Oct 1;7(10):e2437711. doi: 10.1001/jamanetworkopen.2024.37711.
3
Evaluating local open-source large language models for data extraction from unstructured reports on mechanical thrombectomy in patients with ischemic stroke.
评估本地开源大语言模型用于从缺血性中风患者机械取栓的非结构化报告中提取数据。
J Neurointerv Surg. 2025 Jan 26. doi: 10.1136/jnis-2024-022078.
4
Physician and Artificial Intelligence Chatbot Responses to Cancer Questions From Social Media.医生与人工智能聊天机器人对社交媒体上癌症问题的回复。
JAMA Oncol. 2024 Jul 1;10(7):956-960. doi: 10.1001/jamaoncol.2024.0836.
5
A critical assessment of using ChatGPT for extracting structured data from clinical notes.对使用ChatGPT从临床记录中提取结构化数据的批判性评估。
NPJ Digit Med. 2024 May 1;7(1):106. doi: 10.1038/s41746-024-01079-8.
6
Artificial Intelligence-Assisted Cancer Status Detection in Radiology Reports.人工智能辅助放射学报告中的癌症状态检测。
Cancer Res Commun. 2024 Apr 9;4(4):1041-1049. doi: 10.1158/2767-9764.CRC-24-0064.
7
Large language models and multimodal foundation models for precision oncology.用于精准肿瘤学的大语言模型和多模态基础模型。
NPJ Precis Oncol. 2024 Mar 22;8(1):72. doi: 10.1038/s41698-024-00573-2.
8
Challenges and barriers of using large language models (LLM) such as ChatGPT for diagnostic medicine with a focus on digital pathology - a recent scoping review.使用大型语言模型(如 ChatGPT)进行诊断医学的挑战和障碍,重点是数字病理学——近期的范围综述。
Diagn Pathol. 2024 Feb 27;19(1):43. doi: 10.1186/s13000-024-01464-7.
9
Zero-shot information extraction from radiological reports using ChatGPT.使用 ChatGPT 从放射报告中进行零样本信息提取。
Int J Med Inform. 2024 Mar;183:105321. doi: 10.1016/j.ijmedinf.2023.105321. Epub 2023 Dec 21.
10
Evaluation of ChatGPT and Google Bard Using Prompt Engineering in Cancer Screening Algorithms.利用提示工程评估癌症筛查算法中的 ChatGPT 和 Google Bard。
Acad Radiol. 2024 May;31(5):1799-1804. doi: 10.1016/j.acra.2023.11.002. Epub 2023 Dec 15.