利用大型语言模型为个性化肿瘤学提供决策支持。

Leveraging Large Language Models for Decision Support in Personalized Oncology.

机构信息

Charité Comprehensive Cancer Center, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany.

Core Unit Bioinformatics, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Charitéplatz 1, Berlin, Germany.

出版信息

JAMA Netw Open. 2023 Nov 1;6(11):e2343689. doi: 10.1001/jamanetworkopen.2023.43689.

DOI:10.1001/jamanetworkopen.2023.43689

PMID:37976064

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10656647/

Abstract

IMPORTANCE

Clinical interpretation of complex biomarkers for precision oncology currently requires manual investigations of previous studies and databases. Conversational large language models (LLMs) might be beneficial as automated tools for assisting clinical decision-making.

OBJECTIVE

To assess performance and define their role using 4 recent LLMs as support tools for precision oncology.

DESIGN, SETTING, AND PARTICIPANTS: This diagnostic study examined 10 fictional cases of patients with advanced cancer with genetic alterations. Each case was submitted to 4 different LLMs (ChatGPT, Galactica, Perplexity, and BioMedLM) and 1 expert physician to identify personalized treatment options in 2023. Treatment options were masked and presented to a molecular tumor board (MTB), whose members rated the likelihood of a treatment option coming from an LLM on a scale from 0 to 10 (0, extremely unlikely; 10, extremely likely) and decided whether the treatment option was clinically useful.

MAIN OUTCOMES AND MEASURES

Number of treatment options, precision, recall, F1 score of LLMs compared with human experts, recognizability, and usefulness of recommendations.

RESULTS

For 10 fictional cancer patients (4 with lung cancer, 6 with other; median [IQR] 3.5 [3.0-4.8] molecular alterations per patient), a median (IQR) number of 4.0 (4.0-4.0) compared with 3.0 (3.0-5.0), 7.5 (4.3-9.8), 11.5 (7.8-13.0), and 13.0 (11.3-21.5) treatment options each was identified by the human expert and 4 LLMs, respectively. When considering the expert as a criterion standard, LLM-proposed treatment options reached F1 scores of 0.04, 0.17, 0.14, and 0.19 across all patients combined. Combining treatment options from different LLMs allowed a precision of 0.29 and a recall of 0.29 for an F1 score of 0.29. LLM-generated treatment options were recognized as AI-generated with a median (IQR) 7.5 (5.3-9.0) points in contrast to 2.0 (1.0-3.0) points for manually annotated cases. A crucial reason for identifying AI-generated treatment options was insufficient accompanying evidence. For each patient, at least 1 LLM generated a treatment option that was considered helpful by MTB members. Two unique useful treatment options (including 1 unique treatment strategy) were identified only by LLM.

CONCLUSIONS AND RELEVANCE

In this diagnostic study, treatment options of LLMs in precision oncology did not reach the quality and credibility of human experts; however, they generated helpful ideas that might have complemented established procedures. Considering technological progress, LLMs could play an increasingly important role in assisting with screening and selecting relevant biomedical literature to support evidence-based, personalized treatment decisions.

摘要

重要性

目前，临床解读精准肿瘤学的复杂生物标志物需要手动调查先前的研究和数据库。会话式大型语言模型 (LLM) 可能是一种有益的自动化工具，可以帮助临床决策。

目的

使用最近的 4 个 LLM 作为精准肿瘤学的支持工具，评估性能并定义其作用。

设计、设置和参与者：本诊断研究检查了 10 名患有遗传改变的晚期癌症患者的 10 个虚构病例。每个病例都提交给了 4 个不同的 LLM（ChatGPT、Galactica、Perplexity 和 BiomedLM）和 1 名专家医生，以在 2023 年确定个性化的治疗方案。治疗方案被屏蔽并呈现给分子肿瘤委员会 (MTB)，其成员对 LLM 提出的治疗方案的可能性进行评分，范围从 0 到 10（0，极不可能；10，极有可能），并决定治疗方案是否具有临床意义。

主要结果和措施

治疗方案的数量、精度、召回率、LLM 与人类专家相比的 F1 评分、可识别性和推荐的有用性。

结果

对于 10 名虚构的癌症患者（4 名肺癌，6 名其他；每名患者的中位数 [IQR] 有 3.5 [3.0-4.8] 个分子改变），人类专家和 4 个 LLM 分别确定了中位数（IQR）为 4.0 (4.0-4.0) 与 3.0 (3.0-5.0)、7.5 (4.3-9.8)、11.5 (7.8-13.0) 和 13.0 (11.3-21.5) 个治疗方案。当将专家作为标准进行考虑时，所有患者的 LLM 提出的治疗方案的 F1 评分分别为 0.04、0.17、0.14 和 0.19。结合来自不同 LLM 的治疗方案可以达到 0.29 的精度和 0.29 的召回率，F1 评分为 0.29。LLM 生成的治疗方案被识别为 AI 生成，中位数（IQR）为 7.5（5.3-9.0）分，而手动注释病例为 2.0（1.0-3.0）分。确定 AI 生成的治疗方案的一个关键原因是缺乏伴随的证据。对于每个患者，至少有 1 个 LLM 生成了 MTB 成员认为有用的治疗方案。只有 LLM 确定了 2 个独特有用的治疗方案（包括 1 个独特的治疗策略）。

结论和相关性

在这项诊断研究中，精准肿瘤学中 LLM 的治疗方案没有达到人类专家的质量和可信度；然而，它们产生了有用的想法，可能补充了既定的程序。考虑到技术进步，LLM 可能在协助筛选和选择相关生物医学文献以支持基于证据的个性化治疗决策方面发挥越来越重要的作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c81d/10656647/a3f5914430e3/jamanetwopen-e2343689-g001.jpg

相似文献

Leveraging Large Language Models for Decision Support in Personalized Oncology.

JAMA Netw Open. 2023 Nov 1;6(11):e2343689. doi: 10.1001/jamanetworkopen.2023.43689.

Leveraging Large Language Models for Precision Monitoring of Chemotherapy-Induced Toxicities: A Pilot Study with Expert Comparisons and Future Directions.

Cancers (Basel). 2024 Aug 12;16(16):2830. doi: 10.3390/cancers16162830.

Quality of Large Language Model Responses to Radiation Oncology Patient Care Questions.

JAMA Netw Open. 2024 Apr 1;7(4):e244630. doi: 10.1001/jamanetworkopen.2024.4630.

Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.

J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.

Performance of Large Language Models on a Neurology Board-Style Examination.

JAMA Netw Open. 2023 Dec 1;6(12):e2346721. doi: 10.1001/jamanetworkopen.2023.46721.

Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study.

JMIR Infodemiology. 2024 Aug 29;4:e59641. doi: 10.2196/59641.

Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.

JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.

Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study.

J Med Internet Res. 2024 Jun 14;26:e53297. doi: 10.2196/53297.

Performance of Large Language Models on Medical Oncology Examination Questions.

JAMA Netw Open. 2024 Jun 3;7(6):e2417641. doi: 10.1001/jamanetworkopen.2024.17641.

Large Language Models and User Trust: Consequence of Self-Referential Learning Loop and the Deskilling of Health Care Professionals.

J Med Internet Res. 2024 Apr 25;26:e56764. doi: 10.2196/56764.

引用本文的文献

Clinical decision-making for uveal melanoma radiotherapy: comparative performance of experienced radiation oncologists and leading generative AI models.

Front Oncol. 2025 Aug 14;15:1605916. doi: 10.3389/fonc.2025.1605916. eCollection 2025.

Increasing pathogenic germline variant diagnosis rates in precision medicine: current best practices and future opportunities.

Hum Genomics. 2025 Aug 22;19(1):97. doi: 10.1186/s40246-025-00811-z.

Artificial intelligence across the cancer care continuum.

Cancer. 2025 Aug 15;131(16):e70050. doi: 10.1002/cncr.70050.

Implementing a context-augmented large language model to guide precision cancer medicine.

medRxiv. 2025 Jul 24:2025.05.09.25327312. doi: 10.1101/2025.05.09.25327312.

Development and evaluation of large-language models (LLMs) for oncology: A scoping review.

PLOS Digit Health. 2025 Aug 7;4(8):e0000980. doi: 10.1371/journal.pdig.0000980. eCollection 2025 Aug.

A large language model digital patient system enhances ophthalmology history taking skills.

NPJ Digit Med. 2025 Aug 4;8(1):502. doi: 10.1038/s41746-025-01841-6.

A multi-dimensional performance evaluation of large language models in dental implantology: comparison of ChatGPT, DeepSeek, Grok, Gemini and Qwen across diverse clinical scenarios.

BMC Oral Health. 2025 Jul 28;25(1):1272. doi: 10.1186/s12903-025-06619-6.

Cancer type, stage and prognosis assessment from pathology reports using LLMs.

Sci Rep. 2025 Jul 26;15(1):27300. doi: 10.1038/s41598-025-10709-4.

Role of large language models in the multidisciplinary decision-making process for patients with renal cell carcinoma: a pilot experience.

NPJ Precis Oncol. 2025 Jul 24;9(1):257. doi: 10.1038/s41698-025-01014-4.

Using Open-Source Large Language Models to Identify Access to Germline Genetic Testing in Veterans With Breast Cancer From Unstructured Text.

JCO Clin Cancer Inform. 2025 Jul;9:e2400263. doi: 10.1200/CCI-24-00263. Epub 2025 Jul 22.

本文引用的文献

Evaluating ChatGPT as an adjunct for the multidisciplinary tumor board decision-making in primary breast cancer cases.

Arch Gynecol Obstet. 2023 Dec;308(6):1831-1844. doi: 10.1007/s00404-023-07130-5. Epub 2023 Jul 17.

Targeted treatment in a case series of AR+, HRAS/PIK3CA co-mutated salivary duct carcinoma.

Front Oncol. 2023 Jun 20;13:1107134. doi: 10.3389/fonc.2023.1107134. eCollection 2023.

ChatGPT in glioma adjuvant therapy decision making: ready to assume the role of a doctor in the tumour board?

BMJ Health Care Inform. 2023 Jun;30(1). doi: 10.1136/bmjhci-2023-100775.

Accuracy of a Generative Artificial Intelligence Model in a Complex Diagnostic Challenge.

JAMA. 2023 Jul 3;330(1):78-80. doi: 10.1001/jama.2023.8288.

Large language model (ChatGPT) as a support tool for breast tumor board.

NPJ Breast Cancer. 2023 May 30;9(1):44. doi: 10.1038/s41523-023-00557-8.

How Chatbots and Large Language Model Artificial Intelligence Systems Will Reshape Modern Medicine: Fountain of Creativity or Pandora's Box?

JAMA Intern Med. 2023 Jun 1;183(6):596-597. doi: 10.1001/jamainternmed.2023.1835.

Using AI-generated suggestions from ChatGPT to optimize clinical decision support.

J Am Med Inform Assoc. 2023 Jun 20;30(7):1237-1245. doi: 10.1093/jamia/ocad072.

AI-Generated Medical Advice-GPT and Beyond.

JAMA. 2023 Apr 25;329(16):1349-1350. doi: 10.1001/jama.2023.5321.

Appropriateness of Cardiovascular Disease Prevention Recommendations Obtained From a Popular Online Chat-Based Artificial Intelligence Model.

JAMA. 2023 Mar 14;329(10):842-844. doi: 10.1001/jama.2023.1044.

Non-oncogene-addicted metastatic non-small-cell lung cancer: ESMO Clinical Practice Guideline for diagnosis, treatment and follow-up.

Ann Oncol. 2023 Apr;34(4):358-376. doi: 10.1016/j.annonc.2022.12.013. Epub 2023 Jan 17.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用大型语言模型为个性化肿瘤学提供决策支持。

Leveraging Large Language Models for Decision Support in Personalized Oncology.

机构信息

Charité Comprehensive Cancer Center, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany.

Core Unit Bioinformatics, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Charitéplatz 1, Berlin, Germany.