文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

公开可用的大型语言模型在乳腺癌护理中复杂决策方面的发展。

Evolution of publicly available large language models for complex decision-making in breast cancer care.

机构信息

Institute for Digital Medicine, Philipps-University Marburg, Marburg, Germany.

Department of Gynecology and Obstetrics, Philipps-University Marburg, Marburg, Germany.

出版信息

Arch Gynecol Obstet. 2024 Jul;310(1):537-550. doi: 10.1007/s00404-024-07565-4. Epub 2024 May 29.


DOI:10.1007/s00404-024-07565-4
PMID:38806945
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11169005/
Abstract

PURPOSE: This study investigated the concordance of five different publicly available Large Language Models (LLM) with the recommendations of a multidisciplinary tumor board regarding treatment recommendations for complex breast cancer patient profiles. METHODS: Five LLM, including three versions of ChatGPT (version 4 and 3.5, with data access until September 3021 and January 2022), Llama2, and Bard were prompted to produce treatment recommendations for 20 complex breast cancer patient profiles. LLM recommendations were compared to the recommendations of a multidisciplinary tumor board (gold standard), including surgical, endocrine and systemic treatment, radiotherapy, and genetic testing therapy options. RESULTS: GPT4 demonstrated the highest concordance (70.6%) for invasive breast cancer patient profiles, followed by GPT3.5 September 2021 (58.8%), GPT3.5 January 2022 (41.2%), Llama2 (35.3%) and Bard (23.5%). Including precancerous lesions of ductal carcinoma in situ, the identical ranking was reached with lower overall concordance for each LLM (GPT4 60.0%, GPT3.5 September 2021 50.0%, GPT3.5 January 2022 35.0%, Llama2 30.0%, Bard 20.0%). GPT4 achieved full concordance (100%) for radiotherapy. Lowest alignment was reached in recommending genetic testing, demonstrating a varying concordance (55.0% for GPT3.5 January 2022, Llama2 and Bard up to 85.0% for GPT4). CONCLUSION: This early feasibility study is the first to compare different LLM in breast cancer care with regard to changes in accuracy over time, i.e., with access to more data or through technological upgrades. Methodological advancement, i.e., the optimization of prompting techniques, and technological development, i.e., enabling data input control and secure data processing, are necessary in the preparation of large-scale and multicenter studies to provide evidence on their safe and reliable clinical application. At present, safe and evidenced use of LLM in clinical breast cancer care is not yet feasible.

摘要

目的:本研究旨在探讨五种不同的开源大型语言模型(LLM)与多学科肿瘤委员会的建议在复杂乳腺癌患者治疗方案方面的一致性。

方法:对 5 种 LLM(包括 ChatGPT 的 3 个版本[GPT4、GPT3.5(数据访问截至 2021 年 9 月和 2022 年 1 月)]、Llama2 和 Bard)提示生成 20 例复杂乳腺癌患者的治疗建议。将 LLM 建议与多学科肿瘤委员会(黄金标准)的建议进行比较,包括手术、内分泌和全身治疗、放疗和基因检测治疗选择。

结果:在浸润性乳腺癌患者的治疗方案方面,GPT4 的一致性最高(70.6%),其次是 GPT3.5(2021 年 9 月)(58.8%)、GPT3.5(2022 年 1 月)(41.2%)、Llama2(35.3%)和 Bard(23.5%)。包括导管原位癌的癌前病变时,每个 LLM 的总体一致性均较低,且排名相同(GPT4 为 60.0%,GPT3.5(2021 年 9 月)为 50.0%,GPT3.5(2022 年 1 月)为 35.0%,Llama2 为 30.0%,Bard 为 20.0%)。GPT4 对放疗的建议完全一致(100%)。在推荐基因检测方面,一致性最低,表现出不同的一致性(GPT3.5(2022 年 1 月)为 55.0%,Llama2 和 Bard 为 85.0%)。

结论:本研究是首个比较不同 LLM 在乳腺癌治疗方面的准确性变化的可行性研究,即随着数据量的增加或技术升级,其准确性会发生变化。方法学的进步,即提示技术的优化,以及技术的发展,即能够控制数据输入和安全的数据处理,是在准备大规模和多中心研究时必要的,以提供关于其安全可靠的临床应用的证据。目前,在临床乳腺癌护理中安全可靠地使用 LLM 尚不可行。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2626/11169005/dbe97576a227/404_2024_7565_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2626/11169005/b5c17f1f794d/404_2024_7565_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2626/11169005/e2bb28b767a0/404_2024_7565_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2626/11169005/dbe97576a227/404_2024_7565_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2626/11169005/b5c17f1f794d/404_2024_7565_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2626/11169005/e2bb28b767a0/404_2024_7565_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2626/11169005/dbe97576a227/404_2024_7565_Fig3_HTML.jpg

相似文献

[1]
Evolution of publicly available large language models for complex decision-making in breast cancer care.

Arch Gynecol Obstet. 2024-7

[2]
Proof-of-concept study of a small language model chatbot for breast cancer decision support - a transparent, source-controlled, explainable and data-secure approach.

J Cancer Res Clin Oncol. 2024-10-9

[3]
Challenging ChatGPT 3.5 in Senology-An Assessment of Concordance with Breast Cancer Tumor Board Decision Making.

J Pers Med. 2023-10-16

[4]
Can large language models pass official high-grade exams of the European Society of Neuroradiology courses? A direct comparison between OpenAI chatGPT 3.5, OpenAI GPT4 and Google Bard.

Neuroradiology. 2024-8

[5]
Evaluation of large language models as a diagnostic aid for complex medical cases.

Front Med (Lausanne). 2024-6-20

[6]
Comparing the Performance of Popular Large Language Models on the National Board of Medical Examiners Sample Questions.

Cureus. 2024-3-11

[7]
The performance of artificial intelligence large language model-linked chatbots in surgical decision-making for gastroesophageal reflux disease.

Surg Endosc. 2024-5

[8]
Evaluating large language models for health-related text classification tasks with public social media data.

J Am Med Inform Assoc. 2024-10-1

[9]
Chat Generative Pretrained Transformer (ChatGPT) and Bard: Artificial Intelligence Does not yet Provide Clinically Supported Answers for Hip and Knee Osteoarthritis.

J Arthroplasty. 2024-5

[10]
Comparative Performance of ChatGPT 3.5 and GPT4 on Rhinology Standardized Board Examination Questions.

OTO Open. 2024-6-27

引用本文的文献

[1]
AI-driven simplification of surgical reports in gynecologic oncology: A potential tool for patient education.

Acta Obstet Gynecol Scand. 2025-7

[2]
[Potential applications of large language models in trauma surgery : Opportunities, risks and perspectives].

Unfallchirurgie (Heidelb). 2025-5-12

[3]
Fine-Tuning Large Language Models for Specialized Use Cases.

Mayo Clin Proc Digit Health. 2024-11-29

[4]
ChatGPT's Agreement with the Recommendations from the 18th St. Gallen International Consensus Conference on the Treatment of Early Breast Cancer.

Cancers (Basel). 2024-12-13

[5]
The Role of Artificial Intelligence on Tumor Boards: Perspectives from Surgeons, Medical Oncologists and Radiation Oncologists.

Curr Oncol. 2024-8-27

[6]
How do large language models answer breast cancer quiz questions? A comparative study of GPT-3.5, GPT-4 and Google Gemini.

Radiol Med. 2024-10

[7]
Vignette-based comparative analysis of ChatGPT and specialist treatment decisions for rheumatic patients: results of the Rheum2Guide study.

Rheumatol Int. 2024-10

[8]
Can Chat-GPT read and understand guidelines? An example using the S2k guideline intrauterine growth restriction of the German Society for Gynecology and Obstetrics.

Arch Gynecol Obstet. 2024-11

本文引用的文献

[1]
ChatGPT Versus Consultants: Blinded Evaluation on Answering Otorhinolaryngology Case-Based Questions.

JMIR Med Educ. 2023-12-5

[2]
Applications of the Natural Language Processing Tool ChatGPT in Clinical Practice: Comparative Study and Augmented Systematic Review.

JMIR Med Inform. 2023-11-28

[3]
Large language models and their impact in ophthalmology.

Lancet Digit Health. 2023-12

[4]
Brave New Healthcare: A Narrative Review of Digital Healthcare in American Medicine.

Cureus. 2023-10-4

[5]
Challenging ChatGPT 3.5 in Senology-An Assessment of Concordance with Breast Cancer Tumor Board Decision Making.

J Pers Med. 2023-10-16

[6]
Developing prompts from large language model for extracting clinical information from pathology and ultrasound reports in breast cancer.

Radiat Oncol J. 2023-9

[7]
Status quo and future directions of digitalization in gynecology and obstetrics in Germany: a survey of the commission Digital Medicine of the German Society for Gynecology and Obstetrics.

Arch Gynecol Obstet. 2024-1

[8]
Current landscape of hospital information systems in gynecology and obstetrics in Germany: a survey of the commission Digital Medicine of the German Society for Gynecology and Obstetrics.

Arch Gynecol Obstet. 2023-12

[9]
How to Safely Integrate Large Language Models Into Health Care.

JAMA Health Forum. 2023-9-1

[10]
Comparison of ChatGPT-3.5, ChatGPT-4, and Orthopaedic Resident Performance on Orthopaedic Assessment Examinations.

J Am Acad Orthop Surg. 2023-12-1

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索