• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人工智能聊天机器人在肿瘤学中的医学准确性:一项范围综述。

Medical accuracy of artificial intelligence chatbots in oncology: a scoping review.

作者信息

Chen David, Avison Kate, Alnassar Saif, Huang Ryan S, Raman Srinivas

机构信息

Princess Margaret Hospital Cancer Centre, Radiation Medicine Program, Toronto, ON M5G 2C4, Canada.

Temerty Faculty of Medicine, University of Toronto, Toronto, ON M5S 3K3, Canada.

出版信息

Oncologist. 2025 Apr 4;30(4). doi: 10.1093/oncolo/oyaf038.

DOI:10.1093/oncolo/oyaf038
PMID:40285677
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12032582/
Abstract

BACKGROUND

Recent advances in large language models (LLM) have enabled human-like qualities of natural language competency. Applied to oncology, LLMs have been proposed to serve as an information resource and interpret vast amounts of data as a clinical decision-support tool to improve clinical outcomes.

OBJECTIVE

This review aims to describe the current status of medical accuracy of oncology-related LLM applications and research trends for further areas of investigation.

METHODS

A scoping literature search was conducted on Ovid Medline for peer-reviewed studies published since 2000. We included primary research studies that evaluated the medical accuracy of a large language model applied in oncology settings. Study characteristics and primary outcomes of included studies were extracted to describe the landscape of oncology-related LLMs.

RESULTS

Sixty studies were included based on the inclusion and exclusion criteria. The majority of studies evaluated LLMs in oncology as a health information resource in question-answer style examinations (48%), followed by diagnosis (20%) and management (17%). The number of studies that evaluated the utility of fine-tuning and prompt-engineering LLMs increased over time from 2022 to 2024. Studies reported the advantages of LLMs as an accurate information resource, reduction of clinician workload, and improved accessibility and readability of clinical information, while noting disadvantages such as poor reliability, hallucinations, and need for clinician oversight.

DISCUSSION

There exists significant interest in the application of LLMs in clinical oncology, with a particular focus as a medical information resource and clinical decision support tool. However, further research is needed to validate these tools in external hold-out datasets for generalizability and to improve medical accuracy across diverse clinical scenarios, underscoring the need for clinician supervision of these tools.

摘要

背景

大语言模型(LLM)的最新进展已具备类似人类的自然语言能力。应用于肿瘤学领域时,大语言模型被提议作为一种信息资源,并将大量数据解释为临床决策支持工具,以改善临床结果。

目的

本综述旨在描述肿瘤学相关大语言模型应用的医学准确性现状以及进一步研究领域的趋势。

方法

在Ovid Medline上进行了范围界定文献检索,以查找2000年以来发表的同行评审研究。我们纳入了评估应用于肿瘤学环境中的大语言模型医学准确性的初步研究。提取纳入研究的特征和主要结果,以描述肿瘤学相关大语言模型的情况。

结果

根据纳入和排除标准,共纳入60项研究。大多数研究将肿瘤学中的大语言模型评估为问答式考试中的健康信息资源(48%),其次是诊断(20%)和管理(17%)。从2022年到2024年,评估大语言模型微调及提示工程效用的研究数量随时间增加。研究报告了大语言模型作为准确信息资源的优势、临床医生工作量的减少以及临床信息可及性和可读性的提高,同时也指出了可靠性差、幻觉以及需要临床医生监督等缺点。

讨论

人们对大语言模型在临床肿瘤学中的应用有着浓厚兴趣,尤其关注其作为医学信息资源和临床决策支持工具。然而,需要进一步研究以在外部保留数据集上验证这些工具的可推广性,并提高其在不同临床场景下的医学准确性,这凸显了临床医生对这些工具进行监督的必要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dce8/12032582/7bc6fb07eacb/oyaf038_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dce8/12032582/3dee203240fe/oyaf038_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dce8/12032582/36c922f1e044/oyaf038_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dce8/12032582/7bc6fb07eacb/oyaf038_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dce8/12032582/3dee203240fe/oyaf038_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dce8/12032582/36c922f1e044/oyaf038_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dce8/12032582/7bc6fb07eacb/oyaf038_fig3.jpg

相似文献

1
Medical accuracy of artificial intelligence chatbots in oncology: a scoping review.人工智能聊天机器人在肿瘤学中的医学准确性:一项范围综述。
Oncologist. 2025 Apr 4;30(4). doi: 10.1093/oncolo/oyaf038.
2
Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.用于肿瘤学健康信息提取的大语言模型应用:范围综述
JMIR Cancer. 2025 Mar 28;11:e65984. doi: 10.2196/65984.
3
Large Language Models for Chatbot Health Advice Studies: A Systematic Review.用于聊天机器人健康建议研究的大语言模型:一项系统综述。
JAMA Netw Open. 2025 Feb 3;8(2):e2457879. doi: 10.1001/jamanetworkopen.2024.57879.
4
Do large language model chatbots perform better than established patient information resources in answering patient questions? A comparative study on melanoma.在回答患者问题方面,大型语言模型聊天机器人的表现是否优于成熟的患者信息资源?一项关于黑色素瘤的比较研究。
Br J Dermatol. 2025 Jan 24;192(2):306-315. doi: 10.1093/bjd/ljae377.
5
Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis.全球医学考试中的大语言模型:平台开发与综合分析
J Med Internet Res. 2024 Dec 27;26:e66114. doi: 10.2196/66114.
6
Benchmarking LLM chatbots' oncological knowledge with the Turkish Society of Medical Oncology's annual board examination questions.用土耳其医学肿瘤学会年度委员会考试问题对大型语言模型聊天机器人的肿瘤学知识进行基准测试。
BMC Cancer. 2025 Feb 4;25(1):197. doi: 10.1186/s12885-025-13596-0.
7
The Accuracy and Capability of Artificial Intelligence Solutions in Health Care Examinations and Certificates: Systematic Review and Meta-Analysis.人工智能解决方案在医疗检查和证书中的准确性和能力:系统评价和荟萃分析。
J Med Internet Res. 2024 Nov 5;26:e56532. doi: 10.2196/56532.
8
Performance of Multimodal Artificial Intelligence Chatbots Evaluated on Clinical Oncology Cases.多模态人工智能聊天机器人在临床肿瘤病例中的性能评估。
JAMA Netw Open. 2024 Oct 1;7(10):e2437711. doi: 10.1001/jamanetworkopen.2024.37711.
9
The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review.大型语言模型在变革急诊医学中的作用:范围综述
JMIR Med Inform. 2024 May 10;12:e53787. doi: 10.2196/53787.
10
Applications of Large Language Models in the Field of Suicide Prevention: Scoping Review.大语言模型在自杀预防领域的应用:范围综述
J Med Internet Res. 2025 Jan 23;27:e63126. doi: 10.2196/63126.

引用本文的文献

1
Large language models in clinical nutrition: an overview of its applications, capabilities, limitations, and potential future prospects.临床营养中的大语言模型:其应用、能力、局限性及潜在未来前景概述
Front Nutr. 2025 Aug 7;12:1635682. doi: 10.3389/fnut.2025.1635682. eCollection 2025.

本文引用的文献

1
Evaluation of large language models as a diagnostic aid for complex medical cases.评估大型语言模型作为复杂医疗病例诊断辅助工具的作用。
Front Med (Lausanne). 2024 Jun 20;11:1380148. doi: 10.3389/fmed.2024.1380148. eCollection 2024.
2
Evaluation and mitigation of the limitations of large language models in clinical decision-making.评估和缓解大型语言模型在临床决策中的局限性。
Nat Med. 2024 Sep;30(9):2613-2622. doi: 10.1038/s41591-024-03097-1. Epub 2024 Jul 4.
3
Making sense of artificial intelligence and large language models-including ChatGPT-in pediatric hematology/oncology.
理解人工智能和大型语言模型——包括 ChatGPT——在儿科血液学/肿瘤学中的应用。
Pediatr Blood Cancer. 2024 Sep;71(9):e31143. doi: 10.1002/pbc.31143. Epub 2024 Jun 26.
4
Performance of Large Language Models on Medical Oncology Examination Questions.大语言模型在医学肿瘤学考试问题上的表现。
JAMA Netw Open. 2024 Jun 3;7(6):e2417641. doi: 10.1001/jamanetworkopen.2024.17641.
5
Comparison of Prompt Engineering and Fine-Tuning Strategies in Large Language Models in the Classification of Clinical Notes.大语言模型中提示工程与微调策略在临床记录分类中的比较
AMIA Jt Summits Transl Sci Proc. 2024 May 31;2024:478-487. eCollection 2024.
6
BI-RADS Category Assignments by GPT-3.5, GPT-4, and Google Bard: A Multilanguage Study.BI-RADS 类别分配由 GPT-3.5、GPT-4 和谷歌巴德完成:一项多语言研究。
Radiology. 2024 Apr;311(1):e232133. doi: 10.1148/radiol.232133.
7
ChatGPT accurately performs genetic counseling for gynecologic cancers.ChatGPT 能准确地为妇科癌症提供遗传咨询。
Gynecol Oncol. 2024 Apr;183:115-119. doi: 10.1016/j.ygyno.2024.04.006. Epub 2024 Apr 26.
8
Evaluation of large language models in breast cancer clinical scenarios: a comparative analysis based on ChatGPT-3.5, ChatGPT-4.0, and Claude2.评估大语言模型在乳腺癌临床场景中的应用:基于 ChatGPT-3.5、ChatGPT-4.0 和 Claude2 的比较分析
Int J Surg. 2024 Apr 1;110(4):1941-1950. doi: 10.1097/JS9.0000000000001066.
9
To trust or not to trust: evaluating the reliability and safety of AI responses to laryngeal cancer queries.信任还是不信任:评估人工智能对喉癌查询的回应的可靠性和安全性。
Eur Arch Otorhinolaryngol. 2024 Nov;281(11):6069-6081. doi: 10.1007/s00405-024-08643-8. Epub 2024 Apr 23.
10
Accuracy and usability of artificial intelligence chatbot generated chemotherapy protocols.人工智能聊天机器人生成的化疗方案的准确性和可用性。
Future Oncol. 2024 Apr 22:1-6. doi: 10.2217/fon-2023-0950.