文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

基于开源大语言模型的乳腺癌治疗后患者为中心结局自动提取工具包。

Automated Extraction of Patient-Centered Outcomes After Breast Cancer Treatment: An Open-Source Large Language Model-Based Toolkit.

机构信息

Department of Radiology, Mayo Clinic, Phoenix, AZ.

Departments of Medicine and of Epidemiology & Population Health, Stanford University School of Medicine, Palo Alto, CA.

出版信息

JCO Clin Cancer Inform. 2024 Aug;8:e2300258. doi: 10.1200/CCI.23.00258.


DOI:10.1200/CCI.23.00258
PMID:39167746
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11867221/
Abstract

PURPOSE: Patient-centered outcomes (PCOs) are pivotal in cancer treatment, as they directly reflect patients' quality of life. Although multiple studies suggest that factors affecting breast cancer-related morbidity and survival are influenced by treatment side effects and adherence to long-term treatment, such data are generally only available on a smaller scale or from a single center. The primary challenge with collecting these data is that the outcomes are captured as free text in clinical narratives written by clinicians. MATERIALS AND METHODS: Given the complexity of PCO documentation in these narratives, computerized methods are necessary to unlock the wealth of information buried in unstructured text notes that often document PCOs. Inspired by the success of large language models (LLMs), we examined the adaptability of three LLMs, GPT-2, BioGPT, and PMC-LLaMA, on PCO tasks across three institutions, Mayo Clinic, Emory University Hospital, and Stanford University. We developed an open-source framework for fine-tuning LLM that can directly extract the five different categories of PCO from the clinic notes. RESULTS: We found that these LLMs without fine-tuning (zero-shot) struggle with challenging PCO extraction tasks, displaying almost random performance, even with some task-specific examples (few-shot learning). The performance of our fine-tuned, task-specific models is notably superior compared with their non-fine-tuned LLM models. Moreover, the fine-tuned GPT-2 model has demonstrated a significantly better performance than the other two larger LLMs. CONCLUSION: Our discovery indicates that although LLMs serve as effective general-purpose models for tasks across various domains, they require fine-tuning when applied to the clinician domain. Our proposed approach has the potential to lead more efficient, adaptable models for PCO information extraction, reducing reliance on extensive computational resources while still delivering superior performance for specific tasks.

摘要

目的:患者为中心的结局(PCOs)在癌症治疗中至关重要,因为它们直接反映了患者的生活质量。尽管多项研究表明,影响乳腺癌相关发病率和生存率的因素受到治疗副作用和长期治疗依从性的影响,但这些数据通常仅在较小规模或来自单个中心获得。收集这些数据的主要挑战是,结局是以临床医生撰写的临床叙述中的自由文本形式捕获的。

材料和方法:鉴于这些叙述中 PCO 文档的复杂性,需要计算机化方法来解锁隐藏在非结构化文本注释中的丰富信息,这些注释通常记录了 PCOs。受大型语言模型(LLMs)成功的启发,我们检查了 GPT-2、BioGPT 和 PMC-LLaMA 这三个 LLM 在梅奥诊所、埃默里大学医院和斯坦福大学三个机构的 PCO 任务上的适应性。我们开发了一个用于微调 LLM 的开源框架,该框架可以直接从诊所记录中提取五个不同类别的 PCO。

结果:我们发现,这些未经微调的 LLM(零样本)在具有挑战性的 PCO 提取任务中表现不佳,即使提供了一些特定于任务的示例(少样本学习),性能也几乎随机。与非微调的 LLM 模型相比,我们专门针对任务进行微调的模型的性能明显更好。此外,微调后的 GPT-2 模型的性能明显优于其他两个更大的 LLM。

结论:我们的发现表明,尽管 LLM 作为跨各种领域任务的有效通用模型,但在应用于临床医生领域时需要进行微调。我们提出的方法有可能为 PCO 信息提取带来更高效、适应性更强的模型,减少对大量计算资源的依赖,同时仍能为特定任务提供卓越的性能。

相似文献

[1]
Automated Extraction of Patient-Centered Outcomes After Breast Cancer Treatment: An Open-Source Large Language Model-Based Toolkit.

JCO Clin Cancer Inform. 2024-8

[2]
A dataset and benchmark for hospital course summarization with adapted large language models.

J Am Med Inform Assoc. 2025-3-1

[3]
Open-Source Hybrid Large Language Model Integrated System for Extraction of Breast Cancer Treatment Pathway From Free-Text Clinical Notes.

JCO Clin Cancer Inform. 2025-6

[4]
Short-Term Memory Impairment

2025-1

[5]
Toward Cross-Hospital Deployment of Natural Language Processing Systems: Model Development and Validation of Fine-Tuned Large Language Models for Disease Name Recognition in Japanese.

JMIR Med Inform. 2025-7-8

[6]
Sexual Harassment and Prevention Training

2025-1

[7]
Implementing Large Language Models in Health Care: Clinician-Focused Review With Interactive Guideline.

J Med Internet Res. 2025-7-11

[8]
Advancing entity recognition in biomedicine via instruction tuning of large language models.

Bioinformatics. 2024-3-29

[9]
BioInstruct: instruction tuning of large language models for biomedical natural language processing.

J Am Med Inform Assoc. 2024-9-1

[10]
Improving Large Language Models' Summarization Accuracy by Adding Highlights to Discharge Notes: Comparative Evaluation.

JMIR Med Inform. 2025-7-24

引用本文的文献

[1]
Open-Source Hybrid Large Language Model Integrated System for Extraction of Breast Cancer Treatment Pathway From Free-Text Clinical Notes.

JCO Clin Cancer Inform. 2025-6

[2]
The Role of Artificial Intelligence (ChatGPT-4o) in Supporting Tumor Board Decisions.

J Clin Med. 2025-5-18

[3]
Assessing the accuracy of the GPT-4 model in multidisciplinary tumor board decision prediction.

Clin Transl Oncol. 2025-3-25

[4]
Large language models in cancer: potentials, risks, and safeguards.

BJR Artif Intell. 2024-12-20

本文引用的文献

[1]
A large language model-based generative natural language processing framework fine-tuned on clinical notes accurately extracts headache frequency from electronic health records.

Headache. 2024-4

[2]
Deep learning-based natural language processing for detecting medical symptoms and histories in emergency patient triage.

Am J Emerg Med. 2024-3

[3]
Leveraging Large Language Models for Decision Support in Personalized Oncology.

JAMA Netw Open. 2023-11-1

[4]
Developing prompts from large language model for extracting clinical information from pathology and ultrasound reports in breast cancer.

Radiat Oncol J. 2023-9

[5]
Empirical evaluation of language modeling to ascertain cancer outcomes from clinical text reports.

BMC Bioinformatics. 2023-9-2

[6]
Decoding radiology reports: Potential application of OpenAI ChatGPT to enhance patient understanding of diagnostic reports.

Clin Imaging. 2023-9

[7]
Extracting Medical Information From Free-Text and Unstructured Patient-Generated Health Data Using Natural Language Processing Methods: Feasibility Study With Real-world Data.

JMIR Form Res. 2023-3-7

[8]
A large language model for electronic health records.

NPJ Digit Med. 2022-12-26

[9]
BioGPT: generative pre-trained transformer for biomedical text generation and mining.

Brief Bioinform. 2022-11-19

[10]
Relation Extraction from Clinical Narratives Using Pre-trained Language Models.

AMIA Annu Symp Proc. 2020-3-4

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索