人工智能聊天机器人在肿瘤学中的医学准确性：一项范围综述。

Medical accuracy of artificial intelligence chatbots in oncology: a scoping review.

作者信息

Chen David, Avison Kate, Alnassar Saif, Huang Ryan S, Raman Srinivas

机构信息

Princess Margaret Hospital Cancer Centre, Radiation Medicine Program, Toronto, ON M5G 2C4, Canada.

Temerty Faculty of Medicine, University of Toronto, Toronto, ON M5S 3K3, Canada.

出版信息

Oncologist. 2025 Apr 4;30(4). doi: 10.1093/oncolo/oyaf038.

DOI:10.1093/oncolo/oyaf038

PMID:40285677

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12032582/

Abstract

BACKGROUND

Recent advances in large language models (LLM) have enabled human-like qualities of natural language competency. Applied to oncology, LLMs have been proposed to serve as an information resource and interpret vast amounts of data as a clinical decision-support tool to improve clinical outcomes.

OBJECTIVE

This review aims to describe the current status of medical accuracy of oncology-related LLM applications and research trends for further areas of investigation.

METHODS

A scoping literature search was conducted on Ovid Medline for peer-reviewed studies published since 2000. We included primary research studies that evaluated the medical accuracy of a large language model applied in oncology settings. Study characteristics and primary outcomes of included studies were extracted to describe the landscape of oncology-related LLMs.

RESULTS

Sixty studies were included based on the inclusion and exclusion criteria. The majority of studies evaluated LLMs in oncology as a health information resource in question-answer style examinations (48%), followed by diagnosis (20%) and management (17%). The number of studies that evaluated the utility of fine-tuning and prompt-engineering LLMs increased over time from 2022 to 2024. Studies reported the advantages of LLMs as an accurate information resource, reduction of clinician workload, and improved accessibility and readability of clinical information, while noting disadvantages such as poor reliability, hallucinations, and need for clinician oversight.

DISCUSSION

There exists significant interest in the application of LLMs in clinical oncology, with a particular focus as a medical information resource and clinical decision support tool. However, further research is needed to validate these tools in external hold-out datasets for generalizability and to improve medical accuracy across diverse clinical scenarios, underscoring the need for clinician supervision of these tools.

摘要

背景

大语言模型（LLM）的最新进展已具备类似人类的自然语言能力。应用于肿瘤学领域时，大语言模型被提议作为一种信息资源，并将大量数据解释为临床决策支持工具，以改善临床结果。

目的

本综述旨在描述肿瘤学相关大语言模型应用的医学准确性现状以及进一步研究领域的趋势。

方法

在Ovid Medline上进行了范围界定文献检索，以查找2000年以来发表的同行评审研究。我们纳入了评估应用于肿瘤学环境中的大语言模型医学准确性的初步研究。提取纳入研究的特征和主要结果，以描述肿瘤学相关大语言模型的情况。

结果

根据纳入和排除标准，共纳入60项研究。大多数研究将肿瘤学中的大语言模型评估为问答式考试中的健康信息资源（48%），其次是诊断（20%）和管理（17%）。从2022年到2024年，评估大语言模型微调及提示工程效用的研究数量随时间增加。研究报告了大语言模型作为准确信息资源的优势、临床医生工作量的减少以及临床信息可及性和可读性的提高，同时也指出了可靠性差、幻觉以及需要临床医生监督等缺点。

讨论

人们对大语言模型在临床肿瘤学中的应用有着浓厚兴趣，尤其关注其作为医学信息资源和临床决策支持工具。然而，需要进一步研究以在外部保留数据集上验证这些工具的可推广性，并提高其在不同临床场景下的医学准确性，这凸显了临床医生对这些工具进行监督的必要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dce8/12032582/3dee203240fe/oyaf038_fig1.jpg

相似文献

Medical accuracy of artificial intelligence chatbots in oncology: a scoping review.

Oncologist. 2025 Apr 4;30(4). doi: 10.1093/oncolo/oyaf038.

Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.

JMIR Cancer. 2025 Mar 28;11:e65984. doi: 10.2196/65984.

Large Language Models for Chatbot Health Advice Studies: A Systematic Review.

JAMA Netw Open. 2025 Feb 3;8(2):e2457879. doi: 10.1001/jamanetworkopen.2024.57879.

Do large language model chatbots perform better than established patient information resources in answering patient questions? A comparative study on melanoma.

Br J Dermatol. 2025 Jan 24;192(2):306-315. doi: 10.1093/bjd/ljae377.

Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis.

J Med Internet Res. 2024 Dec 27;26:e66114. doi: 10.2196/66114.

Benchmarking LLM chatbots' oncological knowledge with the Turkish Society of Medical Oncology's annual board examination questions.

BMC Cancer. 2025 Feb 4;25(1):197. doi: 10.1186/s12885-025-13596-0.

The Accuracy and Capability of Artificial Intelligence Solutions in Health Care Examinations and Certificates: Systematic Review and Meta-Analysis.

J Med Internet Res. 2024 Nov 5;26:e56532. doi: 10.2196/56532.

Performance of Multimodal Artificial Intelligence Chatbots Evaluated on Clinical Oncology Cases.

JAMA Netw Open. 2024 Oct 1;7(10):e2437711. doi: 10.1001/jamanetworkopen.2024.37711.

The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review.

JMIR Med Inform. 2024 May 10;12:e53787. doi: 10.2196/53787.

Applications of Large Language Models in the Field of Suicide Prevention: Scoping Review.

J Med Internet Res. 2025 Jan 23;27:e63126. doi: 10.2196/63126.

引用本文的文献

Large language models in clinical nutrition: an overview of its applications, capabilities, limitations, and potential future prospects.

Front Nutr. 2025 Aug 7;12:1635682. doi: 10.3389/fnut.2025.1635682. eCollection 2025.

本文引用的文献

Evaluation of large language models as a diagnostic aid for complex medical cases.

Front Med (Lausanne). 2024 Jun 20;11:1380148. doi: 10.3389/fmed.2024.1380148. eCollection 2024.

Evaluation and mitigation of the limitations of large language models in clinical decision-making.

Nat Med. 2024 Sep;30(9):2613-2622. doi: 10.1038/s41591-024-03097-1. Epub 2024 Jul 4.

Making sense of artificial intelligence and large language models-including ChatGPT-in pediatric hematology/oncology.

Pediatr Blood Cancer. 2024 Sep;71(9):e31143. doi: 10.1002/pbc.31143. Epub 2024 Jun 26.

Performance of Large Language Models on Medical Oncology Examination Questions.

JAMA Netw Open. 2024 Jun 3;7(6):e2417641. doi: 10.1001/jamanetworkopen.2024.17641.

Comparison of Prompt Engineering and Fine-Tuning Strategies in Large Language Models in the Classification of Clinical Notes.

AMIA Jt Summits Transl Sci Proc. 2024 May 31;2024:478-487. eCollection 2024.

BI-RADS Category Assignments by GPT-3.5, GPT-4, and Google Bard: A Multilanguage Study.

Radiology. 2024 Apr;311(1):e232133. doi: 10.1148/radiol.232133.

ChatGPT accurately performs genetic counseling for gynecologic cancers.

Gynecol Oncol. 2024 Apr;183:115-119. doi: 10.1016/j.ygyno.2024.04.006. Epub 2024 Apr 26.

Evaluation of large language models in breast cancer clinical scenarios: a comparative analysis based on ChatGPT-3.5, ChatGPT-4.0, and Claude2.

Int J Surg. 2024 Apr 1;110(4):1941-1950. doi: 10.1097/JS9.0000000000001066.

To trust or not to trust: evaluating the reliability and safety of AI responses to laryngeal cancer queries.

Eur Arch Otorhinolaryngol. 2024 Nov;281(11):6069-6081. doi: 10.1007/s00405-024-08643-8. Epub 2024 Apr 23.

Accuracy and usability of artificial intelligence chatbot generated chemotherapy protocols.

Future Oncol. 2024 Apr 22:1-6. doi: 10.2217/fon-2023-0950.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

人工智能聊天机器人在肿瘤学中的医学准确性：一项范围综述。

Medical accuracy of artificial intelligence chatbots in oncology: a scoping review.

作者信息

Chen David, Avison Kate, Alnassar Saif, Huang Ryan S, Raman Srinivas

机构信息

Princess Margaret Hospital Cancer Centre, Radiation Medicine Program, Toronto, ON M5G 2C4, Canada.

Temerty Faculty of Medicine, University of Toronto, Toronto, ON M5S 3K3, Canada.