• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于从自由文本临床记录中提取乳腺癌治疗路径的开源混合大语言模型集成系统

Open-Source Hybrid Large Language Model Integrated System for Extraction of Breast Cancer Treatment Pathway From Free-Text Clinical Notes.

作者信息

Tariq Amara, Sikha Madhu, Kurian Allison W, Ward Kevin, Keegan Theresa H M, Rubin Daniel L, Banerjee Imon

机构信息

Department of Radiology, Mayo Clinic, Phoenix, AZ.

Departments of Medicine and of Epidemiology & Population Health, Stanford University School of Medicine, Palo Alto, CA.

出版信息

JCO Clin Cancer Inform. 2025 Jun;9:e2500002. doi: 10.1200/CCI-25-00002. Epub 2025 Jun 27.

DOI:10.1200/CCI-25-00002
PMID:40577660
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12208650/
Abstract

PURPOSE

Automated curation of breast cancer treatment data with minimal human involvement could accelerate the collection of statewide and nationwide evidence for patient management and assessing the effectiveness of treatment pathways. The primary challenges are the complexity and inconsistency of structured clinical data streams and accurate extraction of this information from free-text clinical narratives.

MATERIALS AND METHODS

We proposed a hybrid two-phase information extraction framework that combined a Unified Medical Language System parser (phase-1) with a fine-tuned large language model (LLM; phase-2) to extract longitudinal treatment timelines from time-stamped clinical notes. Our framework was developed through end-to-end joint learning as a question-answering model, where the model was trained to simultaneously answer five questions, each corresponding to a specific treatment.

RESULTS

We fine-tuned and internally validated the model on 26,692 patients with breast cancer (diagnosed between 2013 and 2020) receiving treatment at Mayo Clinic and externally validated the model on 162 randomly selected patients from Stanford Healthcare. Zero-shot LLM (out-of-the-box) had high specificity but low sensitivity, indicating that although these frameworks are useful for generic language understanding, they are lacking in terms of targeted clinical tasks. The proposed model achieved 0.942 average AUROC on the internal and 0.924 on the external data, demonstrating only marginal drop in performance when evaluated on external. The proposed model also achieved better trade-off between sensitivity (average: 79.2%) and specificity (average: 76.2%) compared with rule-based (average sensitivity: 70.5%, average specificity: 68.1%) and structured codes (average sensitivity: 64.1%, average specificity: 83.5%).

CONCLUSION

The proposed framework can extract temporal information about cancer treatments from various time-stamped clinic notes, regardless of the setting of treatment administration (inpatient or outpatient) or time frame. To support the cancer research community for such data curation and longitudinal analysis, we have packaged the code as a docker image, which needs minimal system reconfiguration and shared with an open-source academic license.

摘要

目的

以最少的人工干预自动整理乳腺癌治疗数据,可加快收集全州和全国范围内用于患者管理及评估治疗路径有效性的证据。主要挑战在于结构化临床数据流的复杂性和不一致性,以及从自由文本临床叙述中准确提取此类信息。

材料与方法

我们提出了一种混合两阶段信息提取框架,该框架将统一医学语言系统解析器(第一阶段)与微调后的大语言模型(第二阶段)相结合,以从带时间戳的临床记录中提取纵向治疗时间线。我们的框架是通过端到端联合学习开发的一个问答模型,在该模型中,模型被训练同时回答五个问题,每个问题对应一种特定治疗。

结果

我们在梅奥诊所接受治疗的26692例乳腺癌患者(2013年至2020年期间确诊)上对模型进行了微调及内部验证,并在斯坦福医疗随机选取的162例患者上对模型进行了外部验证。零样本大语言模型(开箱即用)具有高特异性但低敏感性,这表明尽管这些框架对通用语言理解有用,但在针对性临床任务方面存在不足。所提出的模型在内部数据上的平均曲线下面积(AUROC)为0.942,在外部数据上为0.924,表明在外部评估时性能仅略有下降。与基于规则的方法(平均敏感性:70.5%,平均特异性:68.1%)和结构化编码(平均敏感性:64.1%,平均特异性:83.5%)相比,所提出的模型在敏感性(平均:79.2%)和特异性(平均:76.2%)之间也实现了更好的权衡。

结论

所提出的框架能够从各种带时间戳的临床记录中提取有关癌症治疗的时间信息,无论治疗管理的环境(住院或门诊)或时间范围如何。为支持癌症研究界进行此类数据整理和纵向分析,我们已将代码打包为一个Docker镜像,其需要最少的系统重新配置,并以开源学术许可共享。

相似文献

1
Open-Source Hybrid Large Language Model Integrated System for Extraction of Breast Cancer Treatment Pathway From Free-Text Clinical Notes.用于从自由文本临床记录中提取乳腺癌治疗路径的开源混合大语言模型集成系统
JCO Clin Cancer Inform. 2025 Jun;9:e2500002. doi: 10.1200/CCI-25-00002. Epub 2025 Jun 27.
2
Automated Extraction of Patient-Centered Outcomes After Breast Cancer Treatment: An Open-Source Large Language Model-Based Toolkit.基于开源大语言模型的乳腺癌治疗后患者为中心结局自动提取工具包。
JCO Clin Cancer Inform. 2024 Aug;8:e2300258. doi: 10.1200/CCI.23.00258.
3
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
4
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.
5
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
6
Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.性虐待和暴力的心理社会干预的幸存者、家庭和专业人员的经验:定性证据综合。
Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.
7
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。
Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.
8
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状Meta分析。
Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.
9
Magnetic resonance perfusion for differentiating low-grade from high-grade gliomas at first presentation.首次就诊时磁共振灌注成像用于鉴别低级别与高级别胶质瘤
Cochrane Database Syst Rev. 2018 Jan 22;1(1):CD011551. doi: 10.1002/14651858.CD011551.pub2.
10
Education support services for improving school engagement and academic performance of children and adolescents with a chronic health condition.改善患有慢性病的儿童和青少年的学校参与度和学业成绩的教育支持服务。
Cochrane Database Syst Rev. 2023 Feb 8;2(2):CD011538. doi: 10.1002/14651858.CD011538.pub2.

本文引用的文献

1
Leveraging Rule-Based NLP to Translate Textual Reports as Structured Inputs Automatically Processed by a Clinical Decision Support System.利用基于规则的自然语言处理技术自动将文本报告转换为临床决策支持系统可处理的结构化输入。
Stud Health Technol Inform. 2024 Aug 22;316:1861-1865. doi: 10.3233/SHTI240794.
2
Automated Extraction of Patient-Centered Outcomes After Breast Cancer Treatment: An Open-Source Large Language Model-Based Toolkit.基于开源大语言模型的乳腺癌治疗后患者为中心结局自动提取工具包。
JCO Clin Cancer Inform. 2024 Aug;8:e2300258. doi: 10.1200/CCI.23.00258.
3
Development and Validation of a Natural Language Processing Algorithm for Extracting Clinical and Pathological Features of Breast Cancer From Pathology Reports.开发和验证一种从病理报告中提取乳腺癌临床和病理特征的自然语言处理算法。
JCO Clin Cancer Inform. 2024 Aug;8:e2400034. doi: 10.1200/CCI.24.00034.
4
A comparative study of large language model-based zero-shot inference and task-specific supervised classification of breast cancer pathology reports.基于大语言模型的零样本推理与乳腺癌病理报告任务特定监督分类的比较研究。
J Am Med Inform Assoc. 2024 Oct 1;31(10):2315-2327. doi: 10.1093/jamia/ocae146.
5
Zero-shot learning to extract assessment criteria and medical services from the preventive healthcare guidelines using large language models.基于大语言模型的零样本学习从预防保健指南中提取评估标准和医疗服务。
J Am Med Inform Assoc. 2024 Aug 1;31(8):1743-1753. doi: 10.1093/jamia/ocae145.
6
Natural language processing pipeline to extract prostate cancer-related information from clinical notes.从临床记录中提取前列腺癌相关信息的自然语言处理管道。
Eur Radiol. 2024 Dec;34(12):7878-7891. doi: 10.1007/s00330-024-10812-6. Epub 2024 Jun 6.
7
Large language model-based information extraction from free-text radiology reports: a scoping review protocol.基于大型语言模型的自由文本放射学报告信息提取:范围综述方案。
BMJ Open. 2023 Dec 9;13(12):e076865. doi: 10.1136/bmjopen-2023-076865.
8
Developing prompts from large language model for extracting clinical information from pathology and ultrasound reports in breast cancer.利用大语言模型开发提示,以从乳腺癌的病理学和超声报告中提取临床信息。
Radiat Oncol J. 2023 Sep;41(3):209-216. doi: 10.3857/roj.2023.00633. Epub 2023 Sep 21.
9
Use of Artificial Intelligence Chatbots for Cancer Treatment Information.使用人工智能聊天机器人获取癌症治疗信息。
JAMA Oncol. 2023 Oct 1;9(10):1459-1462. doi: 10.1001/jamaoncol.2023.2954.
10
Large language models in medicine.医学中的大型语言模型。
Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.