用于肿瘤学健康信息提取的大语言模型应用：范围综述

Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.

作者信息

Chen David, Alnassar Saif Addeen, Avison Kate Elizabeth, Huang Ryan S, Raman Srinivas

机构信息

Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada.

Department of Systems Design Engineering, University of Waterloo, Waterloo, ON, Canada.

出版信息

JMIR Cancer. 2025 Mar 28;11:e65984. doi: 10.2196/65984.

DOI:10.2196/65984

PMID:40153782

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11970800/

Abstract

BACKGROUND

Natural language processing systems for data extraction from unstructured clinical text require expert-driven input for labeled annotations and model training. The natural language processing competency of large language models (LLM) can enable automated data extraction of important patient characteristics from electronic health records, which is useful for accelerating cancer clinical research and informing oncology care.

OBJECTIVE

This scoping review aims to map the current landscape, including definitions, frameworks, and future directions of LLMs applied to data extraction from clinical text in oncology.

METHODS

We queried Ovid MEDLINE for primary, peer-reviewed research studies published since 2000 on June 2, 2024, using oncology- and LLM-related keywords. This scoping review included studies that evaluated the performance of an LLM applied to data extraction from clinical text in oncology contexts. Study attributes and main outcomes were extracted to outline key trends of research in LLM-based data extraction.

RESULTS

The literature search yielded 24 studies for inclusion. The majority of studies assessed original and fine-tuned variants of the BERT LLM (n=18, 75%) followed by the Chat-GPT conversational LLM (n=6, 25%). LLMs for data extraction were commonly applied in pan-cancer clinical settings (n=11, 46%), followed by breast (n=4, 17%), and lung (n=4, 17%) cancer contexts, and were evaluated using multi-institution datasets (n=18, 75%). Comparing the studies published in 2022-2024 versus 2019-2021, both the total number of studies (18 vs 6) and the proportion of studies using prompt engineering increased (5/18, 28% vs 0/6, 0%), while the proportion using fine-tuning decreased (8/18, 44.4% vs 6/6, 100%). Advantages of LLMs included positive data extraction performance and reduced manual workload.

CONCLUSIONS

LLMs applied to data extraction in oncology can serve as useful automated tools to reduce the administrative burden of reviewing patient health records and increase time for patient-facing care. Recent advances in prompt-engineering and fine-tuning methods, and multimodal data extraction present promising directions for future research. Further studies are needed to evaluate the performance of LLM-enabled data extraction in clinical domains beyond the training dataset and to assess the scope and integration of LLMs into real-world clinical environments.

摘要

背景

用于从非结构化临床文本中提取数据的自然语言处理系统需要专家驱动的输入来进行标注和模型训练。大语言模型（LLM）的自然语言处理能力能够从电子健康记录中自动提取重要的患者特征，这有助于加速癌症临床研究并为肿瘤护理提供信息。

目的

本综述旨在梳理当前应用于肿瘤学临床文本数据提取的大语言模型的现状，包括定义、框架和未来方向。

方法

我们于2024年6月2日在Ovid MEDLINE数据库中检索自2000年以来发表的经同行评审的原发性研究，使用与肿瘤学和大语言模型相关的关键词。本综述纳入了评估大语言模型在肿瘤学背景下从临床文本中提取数据的性能的研究。提取研究属性和主要结果以概述基于大语言模型的数据提取研究的关键趋势。

结果

文献检索得到24项纳入研究。大多数研究评估了BERT大语言模型的原始版本和微调版本（n = 18，75%），其次是Chat-GPT对话式大语言模型（n = 6，25%）。用于数据提取的大语言模型通常应用于泛癌临床环境（n = 11，46%），其次是乳腺癌（n = 4，17%）和肺癌（n = 4，17%）环境，并使用多机构数据集进行评估（n = 18，75%）。比较2022 - 2024年与2019 - 2021年发表的研究，研究总数（18项对6项）和使用提示工程的研究比例均有所增加（5/18，28%对0/6，0%），而使用微调的比例下降（8/18，44.4%对6/6，100%）。大语言模型的优点包括积极的数据提取性能和减少人工工作量。

结论

应用于肿瘤学数据提取的大语言模型可以作为有用的自动化工具，减轻审查患者健康记录的管理负担，并增加用于面向患者护理的时间。提示工程和微调方法以及多模态数据提取方面的最新进展为未来研究提供了有前景的方向。需要进一步研究来评估在训练数据集之外的临床领域中基于大语言模型的数据提取性能，并评估大语言模型在实际临床环境中的应用范围和整合情况。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于肿瘤学健康信息提取的大语言模型应用：范围综述

Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

用于肿瘤学健康信息提取的大语言模型应用：范围综述

Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献