• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ChatGPT 和 Bard 在基于文本的放射学知识评估中的比较性能。

Comparative Performance of ChatGPT and Bard in a Text-Based Radiology Knowledge Assessment.

机构信息

Michael G. DeGroote School of Medicine, McMaster University, Hamilton, ON, Canada.

Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada.

出版信息

Can Assoc Radiol J. 2024 May;75(2):344-350. doi: 10.1177/08465371231193716. Epub 2023 Aug 14.

DOI:10.1177/08465371231193716
PMID:37578849
Abstract

PURPOSE

Bard by Google, a direct competitor to ChatGPT, was recently released. Understanding the relative performance of these different chatbots can provide important insight into their strengths and weaknesses as well as which roles they are most suited to fill. In this project, we aimed to compare the most recent version of ChatGPT, ChatGPT-4, and Bard by Google, in their ability to accurately respond to radiology board examination practice questions.

METHODS

Text-based questions were collected from the 2017-2021 American College of Radiology's Diagnostic Radiology In-Training (DXIT) examinations. ChatGPT-4 and Bard were queried, and their comparative accuracies, response lengths, and response times were documented. Subspecialty-specific performance was analyzed as well.

RESULTS

318 questions were included in our analysis. ChatGPT answered significantly more accurately than Bard (87.11% vs 70.44%, < .0001). ChatGPT's response length was significantly shorter than Bard's (935.28 ± 440.88 characters vs 1437.52 ± 415.91 characters, < .0001). ChatGPT's response time was significantly longer than Bard's (26.79 ± 3.27 seconds vs 7.55 ± 1.88 seconds, < .0001). ChatGPT performed superiorly to Bard in neuroradiology, (100.00% vs 86.21%, = .03), general & physics (85.39% vs 68.54%, < .001), nuclear medicine (80.00% vs 56.67%, < .01), pediatric radiology (93.75% vs 68.75%, = .03), and ultrasound (100.00% vs 63.64%, < .001). In the remaining subspecialties, there were no significant differences between ChatGPT and Bard's performance.

CONCLUSION

ChatGPT displayed superior radiology knowledge compared to Bard. While both chatbots display reasonable radiology knowledge, they should be used with conscious knowledge of their limitations and fallibility. Both chatbots provided incorrect or illogical answer explanations and did not always address the educational content of the question.

摘要

目的

谷歌的 Bard 是 ChatGPT 的直接竞争对手,最近刚刚发布。了解这些不同的聊天机器人的相对性能,可以为它们的优势和劣势以及它们最适合扮演的角色提供重要的见解。在这个项目中,我们旨在比较 ChatGPT 的最新版本 ChatGPT-4 和 Bard,以评估它们在准确回答放射学委员会考试实践问题方面的能力。

方法

从 2017 年至 2021 年美国放射学院的诊断放射学住院医师培训 (DXIT) 考试中收集基于文本的问题。查询了 ChatGPT-4 和 Bard,并记录了它们的准确率、回复长度和回复时间。还分析了专科特定的性能。

结果

我们的分析共纳入 318 个问题。ChatGPT 的回答准确率明显高于 Bard(87.11% 对 70.44%,<.0001)。ChatGPT 的回复长度明显短于 Bard(935.28 ± 440.88 个字符对 1437.52 ± 415.91 个字符,<.0001)。ChatGPT 的回复时间明显长于 Bard(26.79 ± 3.27 秒对 7.55 ± 1.88 秒,<.0001)。ChatGPT 在神经放射学(100.00% 对 86.21%,=.03)、普通放射学与物理学(85.39% 对 68.54%,<.001)、核医学(80.00% 对 56.67%,<.01)、儿科放射学(93.75% 对 68.75%,=.03)和超声(100.00% 对 63.64%,<.001)方面的表现优于 Bard。在其余专科中,ChatGPT 和 Bard 的表现没有显著差异。

结论

ChatGPT 显示出比 Bard 更高的放射学知识。虽然这两个聊天机器人都显示出了合理的放射学知识,但在使用时应意识到它们的局限性和易错性。两个聊天机器人都提供了不正确或不合逻辑的答案解释,并且并不总是能解决问题的教育内容。

相似文献

1
Comparative Performance of ChatGPT and Bard in a Text-Based Radiology Knowledge Assessment.ChatGPT 和 Bard 在基于文本的放射学知识评估中的比较性能。
Can Assoc Radiol J. 2024 May;75(2):344-350. doi: 10.1177/08465371231193716. Epub 2023 Aug 14.
2
Google Gemini and Bard artificial intelligence chatbot performance in ophthalmology knowledge assessment.谷歌 Gemini 和巴德人工智能聊天机器人在眼科知识评估中的表现。
Eye (Lond). 2024 Sep;38(13):2530-2535. doi: 10.1038/s41433-024-03067-4. Epub 2024 Apr 13.
3
Artificial Intelligence Chatbots' Understanding of the Risks and Benefits of Computed Tomography and Magnetic Resonance Imaging Scenarios.人工智能聊天机器人对计算机断层扫描和磁共振成像场景的风险和益处的理解。
Can Assoc Radiol J. 2024 Aug;75(3):518-524. doi: 10.1177/08465371231220561. Epub 2024 Jan 6.
4
How AI Responds to Common Lung Cancer Questions: ChatGPT vs Google Bard.人工智能如何回答常见肺癌问题:ChatGPT 与 Google Bard 对比。
Radiology. 2023 Jun;307(5):e230922. doi: 10.1148/radiol.230922.
5
Performance of ChatGPT-4 and Bard chatbots in responding to common patient questions on prostate cancer Lu-PSMA-617 therapy.ChatGPT-4和Bard聊天机器人在回答关于前列腺癌Lu-PSMA-617疗法常见患者问题方面的表现
Front Oncol. 2024 Jul 12;14:1386718. doi: 10.3389/fonc.2024.1386718. eCollection 2024.
6
Generative artificial intelligence chatbots may provide appropriate informational responses to common vascular surgery questions by patients.生成式人工智能聊天机器人可能会为患者关于常见血管外科问题提供恰当的信息性回复。
Vascular. 2025 Feb;33(1):229-237. doi: 10.1177/17085381241240550. Epub 2024 Mar 18.
7
Performance of ChatGPT, GPT-4, and Google Bard on a Neurosurgery Oral Boards Preparation Question Bank.ChatGPT、GPT-4和谷歌巴德在神经外科口试准备题库上的表现。
Neurosurgery. 2023 Nov 1;93(5):1090-1098. doi: 10.1227/neu.0000000000002551. Epub 2023 Jun 12.
8
Evaluating the Accuracy of ChatGPT and Google BARD in Fielding Oculoplastic Patient Queries: A Comparative Study on Artificial versus Human Intelligence.评估 ChatGPT 和 Google BARD 在处理眼整形患者查询中的准确性:人工智能与人类智能的比较研究。
Ophthalmic Plast Reconstr Surg. 2024;40(3):303-311. doi: 10.1097/IOP.0000000000002567. Epub 2024 Jan 12.
9
Performance evaluation of ChatGPT, GPT-4, and Bard on the official board examination of the Japan Radiology Society.ChatGPT、GPT-4 和 Bard 在日本放射学会官方董事会考试中的表现评估。
Jpn J Radiol. 2024 Feb;42(2):201-207. doi: 10.1007/s11604-023-01491-2. Epub 2023 Oct 4.
10
Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard.比较分析 ChatGPT-3.5、ChatGPT-4.0 和谷歌巴德在近视防控方面的表现:大型语言模型的基准测试。
EBioMedicine. 2023 Sep;95:104770. doi: 10.1016/j.ebiom.2023.104770. Epub 2023 Aug 23.

引用本文的文献

1
Evaluating the Reliability of OpenAI's ChatGPT-4 in Providing Pre-colonoscopy Patient Guidance.评估OpenAI的ChatGPT-4在提供结肠镜检查前患者指导方面的可靠性。
Cureus. 2025 Jun 21;17(6):e86512. doi: 10.7759/cureus.86512. eCollection 2025 Jun.
2
Performance of ChatGPT-3.5 and ChatGPT-4 in Solving Questions Based on Core Concepts in Cardiovascular Physiology.ChatGPT-3.5和ChatGPT-4在基于心血管生理学核心概念解决问题方面的表现。
Cureus. 2025 May 6;17(5):e83552. doi: 10.7759/cureus.83552. eCollection 2025 May.
3
Evaluating the performance of artificial intelligence in summarizing pre-coded text to support evidence synthesis: a comparison between chatbots and humans.
评估人工智能在总结预编码文本以支持证据综合方面的性能:聊天机器人与人类的比较。
BMC Med Res Methodol. 2025 May 30;25(1):150. doi: 10.1186/s12874-025-02532-2.
4
Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.大型语言模型回答临床研究问题的准确性:系统评价与网络荟萃分析
J Med Internet Res. 2025 Apr 30;27:e64486. doi: 10.2196/64486.
5
Evaluating AI-based breastfeeding chatbots: quality, readability, and reliability analysis.评估基于人工智能的母乳喂养聊天机器人:质量、可读性和可靠性分析。
PLoS One. 2025 Mar 17;20(3):e0319782. doi: 10.1371/journal.pone.0319782. eCollection 2025.
6
A Future of Self-Directed Patient Internet Research: Large Language Model-Based Tools Versus Standard Search Engines.自主导向的患者网络研究的未来:基于大语言模型的工具与标准搜索引擎
Ann Biomed Eng. 2025 May;53(5):1199-1208. doi: 10.1007/s10439-025-03701-6. Epub 2025 Mar 3.
7
Comparative analysis of ChatGPT and Gemini (Bard) in medical inquiry: a scoping review.ChatGPT与Gemini(巴德)在医学问诊中的比较分析:一项范围综述
Front Digit Health. 2025 Feb 3;7:1482712. doi: 10.3389/fdgth.2025.1482712. eCollection 2025.
8
Evaluating the Accuracy of Responses by Large Language Models for Information on Disease Epidemiology.评估大语言模型对疾病流行病学信息回答的准确性。
Pharmacoepidemiol Drug Saf. 2025 Feb;34(2):e70111. doi: 10.1002/pds.70111.
9
ChatGPT-4 Omni's superiority in answering multiple-choice oral radiology questions.ChatGPT-4 Omni在回答口腔放射学选择题方面的优势。
BMC Oral Health. 2025 Feb 1;25(1):173. doi: 10.1186/s12903-025-05554-w.
10
Analyzing evaluation methods for large language models in the medical field: a scoping review.分析医学领域大语言模型的评价方法:范围综述。
BMC Med Inform Decis Mak. 2024 Nov 29;24(1):366. doi: 10.1186/s12911-024-02709-7.