• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多模态人工智能聊天机器人在临床肿瘤病例中的性能评估。

Performance of Multimodal Artificial Intelligence Chatbots Evaluated on Clinical Oncology Cases.

机构信息

Radiation Medicine Program, Princess Margaret Hospital Cancer Centre, Toronto, Ontario, Canada.

Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada.

出版信息

JAMA Netw Open. 2024 Oct 1;7(10):e2437711. doi: 10.1001/jamanetworkopen.2024.37711.

DOI:10.1001/jamanetworkopen.2024.37711
PMID:39441598
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11581577/
Abstract

IMPORTANCE

Multimodal artificial intelligence (AI) chatbots can process complex medical image and text-based information that may improve their accuracy as a clinical diagnostic and management tool compared with unimodal, text-only AI chatbots. However, the difference in medical accuracy of multimodal and text-only chatbots in addressing questions about clinical oncology cases remains to be tested.

OBJECTIVE

To evaluate the utility of prompt engineering (zero-shot chain-of-thought) and compare the competency of multimodal and unimodal AI chatbots to generate medically accurate responses to questions about clinical oncology cases.

DESIGN, SETTING, AND PARTICIPANTS: This cross-sectional study benchmarked the medical accuracy of multiple-choice and free-text responses generated by AI chatbots in response to 79 questions about clinical oncology cases with images.

EXPOSURES

A unique set of 79 clinical oncology cases from JAMA Network Learning accessed on April 2, 2024, was posed to 10 AI chatbots.

MAIN OUTCOMES AND MEASURES

The primary outcome was medical accuracy evaluated by the number of correct responses by each AI chatbot. Multiple-choice responses were marked as correct based on the ground-truth, correct answer. Free-text responses were rated by a team of oncology specialists in duplicate and marked as correct based on consensus or resolved by a review of a third oncology specialist.

RESULTS

This study evaluated 10 chatbots, including 3 multimodal and 7 unimodal chatbots. On the multiple-choice evaluation, the top-performing chatbot was chatbot 10 (57 of 79 [72.15%]), followed by the multimodal chatbot 2 (56 of 79 [70.89%]) and chatbot 5 (54 of 79 [68.35%]). On the free-text evaluation, the top-performing chatbots were chatbot 5, chatbot 7, and the multimodal chatbot 2 (30 of 79 [37.97%]), followed by chatbot 10 (29 of 79 [36.71%]) and chatbot 8 and the multimodal chatbot 3 (25 of 79 [31.65%]). The accuracy of multimodal chatbots decreased when tested on cases with multiple images compared with questions with single images. Nine out of 10 chatbots, including all 3 multimodal chatbots, demonstrated decreased accuracy of their free-text responses compared with multiple-choice responses to questions about cancer cases.

CONCLUSIONS AND RELEVANCE

In this cross-sectional study of chatbot accuracy tested on clinical oncology cases, multimodal chatbots were not consistently more accurate than unimodal chatbots. These results suggest that further research is required to optimize multimodal chatbots to make more use of information from images to improve oncology-specific medical accuracy and reliability.

摘要

重要性

多模态人工智能(AI)聊天机器人可以处理复杂的医学图像和基于文本的信息,这可能会提高它们作为临床诊断和管理工具的准确性,与仅基于文本的 AI 聊天机器人相比。然而,多模态和仅基于文本的聊天机器人在回答临床肿瘤病例问题上的医学准确性差异仍有待检验。

目的

评估提示工程(零样本思维链)的效用,并比较多模态和单模态 AI 聊天机器人生成医学上准确回答临床肿瘤病例问题的能力。

设计、设置和参与者:这项横断面研究以 JAMA 网络学习上的 79 个具有图像的临床肿瘤病例为基准,对 10 个 AI 聊天机器人生成的多项选择和自由文本回答的医学准确性进行了基准测试。

暴露

2024 年 4 月 2 日,通过 JAMA 网络学习访问了一组独特的 79 个临床肿瘤病例,这些病例被提交给了 10 个 AI 聊天机器人。

主要结果和措施

主要结果是通过每个 AI 聊天机器人的正确回答数量评估的医学准确性。多项选择回答根据基础事实、正确答案进行标记。自由文本回答由肿瘤学专家小组进行重复评估,并根据共识标记为正确,或由第三位肿瘤学专家进行审查以解决。

结果

本研究评估了 10 个聊天机器人,包括 3 个多模态和 7 个单模态聊天机器人。在多项选择评估中,表现最好的聊天机器人是聊天机器人 10(79 题中的 57 题[72.15%]),其次是多模态聊天机器人 2(79 题中的 56 题[70.89%])和聊天机器人 5(79 题中的 54 题[68.35%])。在自由文本评估中,表现最好的聊天机器人是聊天机器人 5、聊天机器人 7 和多模态聊天机器人 2(79 题中的 30 题[37.97%]),其次是聊天机器人 10(79 题中的 29 题[36.71%])和聊天机器人 8 和多模态聊天机器人 3(79 题中的 25 题[31.65%])。当测试多模态聊天机器人处理多个图像的病例时,它们的准确性会低于处理单个图像的问题。10 个聊天机器人中有 9 个,包括 3 个多模态聊天机器人,在回答癌症病例问题时,其自由文本回答的准确性都低于多项选择回答。

结论和相关性

在这项对临床肿瘤病例进行的聊天机器人准确性的横断面研究中,多模态聊天机器人并不总是比单模态聊天机器人更准确。这些结果表明,需要进一步研究来优化多模态聊天机器人,以更有效地利用图像信息,提高肿瘤学特定的医学准确性和可靠性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f9a2/11581577/90d7d617ccce/jamanetwopen-e2437711-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f9a2/11581577/90d7d617ccce/jamanetwopen-e2437711-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f9a2/11581577/90d7d617ccce/jamanetwopen-e2437711-g001.jpg

相似文献

1
Performance of Multimodal Artificial Intelligence Chatbots Evaluated on Clinical Oncology Cases.多模态人工智能聊天机器人在临床肿瘤病例中的性能评估。
JAMA Netw Open. 2024 Oct 1;7(10):e2437711. doi: 10.1001/jamanetworkopen.2024.37711.
2
Accuracy and Reliability of Artificial Intelligence Chatbots as Public Information Sources in Implant Dentistry.人工智能聊天机器人作为种植牙科公共信息来源的准确性和可靠性
Int J Oral Maxillofac Implants. 2025 Jun 25;0(0):1-23. doi: 10.11607/jomi.11280.
3
Conversational AI and Vaccine Communication: Systematic Review of the Evidence.会话式人工智能与疫苗传播:证据的系统评价。
J Med Internet Res. 2023 Oct 3;25:e42758. doi: 10.2196/42758.
4
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
5
Artificial intelligence for detecting keratoconus.人工智能在圆锥角膜检测中的应用。
Cochrane Database Syst Rev. 2023 Nov 15;11(11):CD014911. doi: 10.1002/14651858.CD014911.pub2.
6
Diagnostic test accuracy and cost-effectiveness of tests for codeletion of chromosomal arms 1p and 19q in people with glioma.染色体臂 1p 和 19q 缺失的检测在胶质瘤患者中的诊断准确性和成本效益。
Cochrane Database Syst Rev. 2022 Mar 2;3(3):CD013387. doi: 10.1002/14651858.CD013387.pub2.
7
Women's Preferences and Willingness to Pay for AI Chatbots in Women's Health: Discrete Choice Experiment Study.女性健康领域中女性对人工智能聊天机器人的偏好及支付意愿:离散选择实验研究
J Med Internet Res. 2025 Jun 10;27:e67303. doi: 10.2196/67303.
8
Rapid, point-of-care antigen tests for diagnosis of SARS-CoV-2 infection.用于 SARS-CoV-2 感染诊断的快速、即时抗原检测。
Cochrane Database Syst Rev. 2022 Jul 22;7(7):CD013705. doi: 10.1002/14651858.CD013705.pub3.
9
Home treatment for mental health problems: a systematic review.心理健康问题的居家治疗:一项系统综述
Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.
10
A systematic review of artificial intelligence chatbots for promoting physical activity, healthy diet, and weight loss.人工智能聊天机器人在促进身体活动、健康饮食和减肥方面的系统评价
Int J Behav Nutr Phys Act. 2021 Dec 11;18(1):160. doi: 10.1186/s12966-021-01224-6.

引用本文的文献

1
Multimodal reasoning agent for enhanced ophthalmic decision-making: a preliminary real-world clinical validation.用于增强眼科决策的多模态推理智能体:一项初步的真实世界临床验证
Front Cell Dev Biol. 2025 Jul 23;13:1642539. doi: 10.3389/fcell.2025.1642539. eCollection 2025.
2
Large language models in oncology: a review.肿瘤学中的大语言模型:综述
BMJ Oncol. 2025 May 15;4(1):e000759. doi: 10.1136/bmjonc-2025-000759. eCollection 2025.
3
Perceptions, Attitudes, and Concerns on Artificial Intelligence Applications in Patients with Cancer.

本文引用的文献

1
Readability and Information Quality in Cancer Information From a Free vs Paid Chatbot.免费与付费聊天机器人提供的癌症信息的可读性和信息质量。
JAMA Netw Open. 2024 Jul 1;7(7):e2422275. doi: 10.1001/jamanetworkopen.2024.22275.
2
Use of artificial intelligence chatbots in clinical management of immune-related adverse events.人工智能聊天机器人在免疫相关不良事件临床管理中的应用。
J Immunother Cancer. 2024 May 30;12(5):e008599. doi: 10.1136/jitc-2023-008599.
3
Physician and Artificial Intelligence Chatbot Responses to Cancer Questions From Social Media.
癌症患者对人工智能应用的认知、态度及担忧
Cancer Control. 2025 Jan-Dec;32:10732748251343245. doi: 10.1177/10732748251343245. Epub 2025 May 23.
4
Patient perceptions of empathy in physician and artificial intelligence chatbot responses to patient questions about cancer.患者对医生和人工智能聊天机器人针对其癌症相关问题回复中同理心的认知。
NPJ Digit Med. 2025 May 13;8(1):275. doi: 10.1038/s41746-025-01671-6.
5
Moving toward precision and personalized treatment strategies in psychiatry.迈向精神医学的精准和个性化治疗策略。
Int J Neuropsychopharmacol. 2025 May 9;28(5). doi: 10.1093/ijnp/pyaf025.
6
Diagnostic performance of multimodal large language models in radiological quiz cases: the effects of prompt engineering and input conditions.多模态大语言模型在放射学问答病例中的诊断性能:提示工程和输入条件的影响
Ultrasonography. 2025 May;44(3):220-231. doi: 10.14366/usg.25012. Epub 2025 Mar 11.
7
Development of a Comprehensive Decision Support Tool for Chemotherapy-Cycle Prescribing: Initial Usability Study.化疗周期处方综合决策支持工具的开发:初步可用性研究
JMIR Form Res. 2025 Mar 31;9:e62749. doi: 10.2196/62749.
8
Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.用于肿瘤学健康信息提取的大语言模型应用:范围综述
JMIR Cancer. 2025 Mar 28;11:e65984. doi: 10.2196/65984.
9
Artificial Intelligence in Relation to Accurate Information and Tasks in Gynecologic Oncology and Clinical Medicine-Dunning-Kruger Effects and Ultracrepidarianism.人工智能与妇科肿瘤学和临床医学中的准确信息及任务——邓宁-克鲁格效应和不懂装懂。
Diagnostics (Basel). 2025 Mar 15;15(6):735. doi: 10.3390/diagnostics15060735.
10
Evaluation of Large Language Models' Concordance With Guidelines on Olfaction.评估大型语言模型与嗅觉指南的一致性。
Laryngoscope Investig Otolaryngol. 2025 Mar 22;10(2):e70130. doi: 10.1002/lio2.70130. eCollection 2025 Apr.
医生与人工智能聊天机器人对社交媒体上癌症问题的回复。
JAMA Oncol. 2024 Jul 1;10(7):956-960. doi: 10.1001/jamaoncol.2024.0836.
4
Accuracy and usability of artificial intelligence chatbot generated chemotherapy protocols.人工智能聊天机器人生成的化疗方案的准确性和可用性。
Future Oncol. 2024 Apr 22:1-6. doi: 10.2217/fon-2023-0950.
5
AI-Generated Draft Replies Integrated Into Health Records and Physicians' Electronic Communication.人工智能生成的草稿回复整合到健康记录和医生的电子通信中。
JAMA Netw Open. 2024 Apr 1;7(4):e246565. doi: 10.1001/jamanetworkopen.2024.6565.
6
Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative AI.用于评估由生成式人工智能驱动的医疗对话有效性的基础指标。
NPJ Digit Med. 2024 Mar 29;7(1):82. doi: 10.1038/s41746-024-01074-z.
7
ChatGPT's Response Consistency: A Study on Repeated Queries of Medical Examination Questions.ChatGPT的回答一致性:关于医学考试问题重复查询的研究
Eur J Investig Health Psychol Educ. 2024 Mar 8;14(3):657-668. doi: 10.3390/ejihpe14030043.
8
Artificial Intelligence-Generated Draft Replies to Patient Inbox Messages.人工智能生成的回复患者收件箱消息草稿。
JAMA Netw Open. 2024 Mar 4;7(3):e243201. doi: 10.1001/jamanetworkopen.2024.3201.
9
Comparative Analysis of Multimodal Large Language Model Performance on Clinical Vignette Questions.多模态大语言模型在临床病例问题上的性能比较分析
JAMA. 2024 Apr 16;331(15):1320-1321. doi: 10.1001/jama.2023.27861.
10
Accuracy of an Artificial Intelligence Chatbot's Interpretation of Clinical Ophthalmic Images.人工智能聊天机器人对临床眼科图像的解读准确性。
JAMA Ophthalmol. 2024 Apr 1;142(4):321-326. doi: 10.1001/jamaophthalmol.2024.0017.