• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ChatGPT 4.0与谷歌Gemini在回答基于文本的儿科放射学问题时的比较准确性

Comparative Accuracy of ChatGPT 4.0 and Google Gemini in Answering Pediatric Radiology Text-Based Questions.

作者信息

Abdul Sami Mohammed, Abdul Samad Mohammed, Parekh Keyur, Suthar Pokhraj P

机构信息

Department of Diagnostic Radiology and Nuclear Medicine, Rush University Medical Center, Chicago, USA.

Department of Diagnostic Radiology, Des Moines University College of Osteopathic Medicine, West Des Moines, USA.

出版信息

Cureus. 2024 Oct 5;16(10):e70897. doi: 10.7759/cureus.70897. eCollection 2024 Oct.

DOI:10.7759/cureus.70897
PMID:39497868
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11534303/
Abstract

AIMS AND OBJECTIVES

This study evaluates the accuracy of two AI language models, ChatGPT 4.0 and Google Gemini (as of August 2024), in answering a set of 79 text-based pediatric radiology questions from "Pediatric Imaging: A Core Review." Accurate interpretation of text and images is critical in radiology, making AI tools valuable in medical education.

METHODS

The study involved 79 questions selected from a pediatric radiology question set, focusing solely on text-based questions. ChatGPT 4.0 and Google Gemini answered these questions, and their responses were evaluated using a binary scoring system. Statistical analyses, including chi-square tests and relative risk (RR) calculations, were performed to compare the overall and subsection accuracy of the models.

RESULTS

ChatGPT 4.0 demonstrated superior accuracy, correctly answering 83.5% (66/79) of the questions, compared to Google Gemini's 68.4% (54/79), with a statistically significant difference (p=0.0255, RR=1.221). No statistically significant differences were found between the models within individual subsections, with p-values ranging from 0.136 to 1.

CONCLUSION

ChatGPT 4.0 outperformed Google Gemini in overall accuracy for text-based pediatric radiology questions, highlighting its potential utility in medical education. However, the lack of significant differences within subsections and the exclusion of image-based questions underscore the need for further research with larger sample sizes and multimodal inputs to fully assess AI models' capabilities in radiology.

摘要

目的和目标

本研究评估了两个人工智能语言模型ChatGPT 4.0和谷歌Gemini(截至2024年8月)在回答一组来自《儿科影像:核心复习》的79道基于文本的儿科放射学问题时的准确性。在放射学中,对文本和图像的准确解读至关重要,这使得人工智能工具在医学教育中具有重要价值。

方法

该研究从儿科放射学问题集中选取了79个问题,仅关注基于文本的问题。ChatGPT 4.0和谷歌Gemini回答了这些问题,并使用二元评分系统对它们的回答进行评估。进行了包括卡方检验和相对风险(RR)计算在内的统计分析,以比较模型的整体和各部分准确性。

结果

ChatGPT 4.0表现出更高的准确性,正确回答了83.5%(66/79)的问题,而谷歌Gemini的正确率为68.4%(54/79),两者存在统计学显著差异(p = 0.0255,RR = 1.221)。在各个子部分中,模型之间未发现统计学显著差异,p值范围为0.136至1。

结论

在基于文本的儿科放射学问题的整体准确性方面,ChatGPT 4.0优于谷歌Gemini,突出了其在医学教育中的潜在效用。然而,子部分内缺乏显著差异以及排除基于图像的问题,凸显了需要进行更大样本量和多模态输入的进一步研究,以全面评估人工智能模型在放射学中的能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9e7/11534303/fb927d1b9dbf/cureus-0016-00000070897-i05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9e7/11534303/0f4475ea2d14/cureus-0016-00000070897-i01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9e7/11534303/9f16bcfecbec/cureus-0016-00000070897-i02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9e7/11534303/1e09684761e2/cureus-0016-00000070897-i03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9e7/11534303/5d31c0282d67/cureus-0016-00000070897-i04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9e7/11534303/fb927d1b9dbf/cureus-0016-00000070897-i05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9e7/11534303/0f4475ea2d14/cureus-0016-00000070897-i01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9e7/11534303/9f16bcfecbec/cureus-0016-00000070897-i02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9e7/11534303/1e09684761e2/cureus-0016-00000070897-i03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9e7/11534303/5d31c0282d67/cureus-0016-00000070897-i04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9e7/11534303/fb927d1b9dbf/cureus-0016-00000070897-i05.jpg

相似文献

1
Comparative Accuracy of ChatGPT 4.0 and Google Gemini in Answering Pediatric Radiology Text-Based Questions.ChatGPT 4.0与谷歌Gemini在回答基于文本的儿科放射学问题时的比较准确性
Cureus. 2024 Oct 5;16(10):e70897. doi: 10.7759/cureus.70897. eCollection 2024 Oct.
2
Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o.人工智能模型在风湿病委员会级问题中的比较性能:评估 Google Gemini 和 ChatGPT-4o。
Clin Rheumatol. 2024 Nov;43(11):3507-3513. doi: 10.1007/s10067-024-07154-5. Epub 2024 Sep 28.
3
Evaluating text and visual diagnostic capabilities of large language models on questions related to the Breast Imaging Reporting and Data System Atlas 5 edition.评估大语言模型在与《乳腺影像报告和数据系统》第5版相关问题上的文本和视觉诊断能力。
Diagn Interv Radiol. 2025 Mar 3;31(2):111-129. doi: 10.4274/dir.2024.242876. Epub 2024 Sep 9.
4
Comparison of Gemini Advanced and ChatGPT 4.0's Performances on the Ophthalmology Resident Ophthalmic Knowledge Assessment Program (OKAP) Examination Review Question Banks.Gemini Advanced与ChatGPT 4.0在眼科住院医师眼科知识评估计划(OKAP)考试复习题库中的表现比较。
Cureus. 2024 Sep 17;16(9):e69612. doi: 10.7759/cureus.69612. eCollection 2024 Sep.
5
Performance of three artificial intelligence (AI)-based large language models in standardized testing; implications for AI-assisted dental education.三种基于人工智能(AI)的大语言模型在标准化测试中的表现;对人工智能辅助牙科教育的启示。
J Periodontal Res. 2025 Feb;60(2):121-133. doi: 10.1111/jre.13323. Epub 2024 Jul 18.
6
Evaluation of the accuracy and readability of ChatGPT-4 and Google Gemini in providing information on retinal detachment: a multicenter expert comparative study.ChatGPT-4和谷歌Gemini在提供视网膜脱离信息方面的准确性和可读性评估:一项多中心专家对比研究。
Int J Retina Vitreous. 2024 Sep 2;10(1):61. doi: 10.1186/s40942-024-00579-9.
7
Comparative Evaluation of AI Models Such as ChatGPT 3.5, ChatGPT 4.0, and Google Gemini in Neuroradiology Diagnostics.ChatGPT 3.5、ChatGPT 4.0和谷歌Gemini等人工智能模型在神经放射学诊断中的比较评估
Cureus. 2024 Aug 25;16(8):e67766. doi: 10.7759/cureus.67766. eCollection 2024 Aug.
8
Comparative performance analysis of large language models: ChatGPT-3.5, ChatGPT-4 and Google Gemini in glucocorticoid-induced osteoporosis.大型语言模型的性能比较分析:ChatGPT-3.5、ChatGPT-4 和 Google Gemini 在糖皮质激素诱导性骨质疏松症中的表现。
J Orthop Surg Res. 2024 Sep 18;19(1):574. doi: 10.1186/s13018-024-04996-2.
9
Gemini AI vs. ChatGPT: A comprehensive examination alongside ophthalmology residents in medical knowledge.Gemini人工智能与ChatGPT对比:与眼科住院医师一起对医学知识进行的全面考察
Graefes Arch Clin Exp Ophthalmol. 2025 Feb;263(2):527-536. doi: 10.1007/s00417-024-06625-4. Epub 2024 Sep 15.
10
Large Language Models can Help with Biostatistics and Coding Needed in Radiology Research.大语言模型有助于解决放射学研究中所需的生物统计学和编码问题。
Acad Radiol. 2025 Feb;32(2):604-611. doi: 10.1016/j.acra.2024.09.042. Epub 2024 Oct 15.

引用本文的文献

1
Layer by Layer: Assessing AI Diagnostic Accuracy With Incremental Case Information in Neuroradiology.逐层分析:利用神经放射学中的增量病例信息评估人工智能诊断准确性
Cureus. 2025 Jun 12;17(6):e85874. doi: 10.7759/cureus.85874. eCollection 2025 Jun.
2
Preparing for Vascular Surgery Board Certification: A Comparative Study Using Large Language Models.为血管外科委员会认证做准备:一项使用大语言模型的比较研究。
Cureus. 2025 May 10;17(5):e83848. doi: 10.7759/cureus.83848. eCollection 2025 May.
3
Evolution of patient education materials from large-language artificial intelligence models on complex regional pain syndrome: are patients learning?

本文引用的文献

1
Comparative Evaluation of AI Models Such as ChatGPT 3.5, ChatGPT 4.0, and Google Gemini in Neuroradiology Diagnostics.ChatGPT 3.5、ChatGPT 4.0和谷歌Gemini等人工智能模型在神经放射学诊断中的比较评估
Cureus. 2024 Aug 25;16(8):e67766. doi: 10.7759/cureus.67766. eCollection 2024 Aug.
2
Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study.ChatGPT-4、微软 Copilot 和谷歌 Gemini 在意大利医疗科学学位入学考试中的比较准确性:一项横断面研究。
BMC Med Educ. 2024 Jun 26;24(1):694. doi: 10.1186/s12909-024-05630-9.
3
基于大语言人工智能模型的复杂区域疼痛综合征患者教育材料的演变:患者有在学习吗?
Proc (Bayl Univ Med Cent). 2025 Feb 28;38(3):221-226. doi: 10.1080/08998280.2025.2470033. eCollection 2025.
4
ChatGPT-4 Turbo and Meta's LLaMA 3.1: A Relative Analysis of Answering Radiology Text-Based Questions.ChatGPT-4 Turbo与Meta的LLaMA 3.1:基于放射学文本问题回答的相关性分析
Cureus. 2024 Nov 24;16(11):e74359. doi: 10.7759/cureus.74359. eCollection 2024 Nov.
Unlocking Health Literacy: The Ultimate Guide to Hypertension Education From ChatGPT Versus Google Gemini.
解锁健康素养:ChatGPT与谷歌Gemini高血压教育终极指南
Cureus. 2024 May 8;16(5):e59898. doi: 10.7759/cureus.59898. eCollection 2024 May.
4
BI-RADS Category Assignments by GPT-3.5, GPT-4, and Google Bard: A Multilanguage Study.BI-RADS 类别分配由 GPT-3.5、GPT-4 和谷歌巴德完成:一项多语言研究。
Radiology. 2024 Apr;311(1):e232133. doi: 10.1148/radiol.232133.
5
Exploring AI-chatbots' capability to suggest surgical planning in ophthalmology: ChatGPT versus Google Gemini analysis of retinal detachment cases.探索 AI 聊天机器人在眼科手术规划方面的建议能力:ChatGPT 与 Google Gemini 对视网膜脱离病例的分析比较。
Br J Ophthalmol. 2024 Sep 20;108(10):1457-1469. doi: 10.1136/bjo-2023-325143.
6
Google DeepMind's gemini AI versus ChatGPT: a comparative analysis in ophthalmology.谷歌深度思维公司的Gemini人工智能与ChatGPT:眼科领域的比较分析
Eye (Lond). 2024 Jun;38(8):1412-1417. doi: 10.1038/s41433-024-02958-w. Epub 2024 Feb 14.
7
Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments.比较 ChatGPT 和 GPT-4 在 USMLE 软技能评估中的表现。
Sci Rep. 2023 Oct 1;13(1):16492. doi: 10.1038/s41598-023-43436-9.
8
Artificial Intelligence (AI) in Radiology: A Deep Dive Into ChatGPT 4.0's Accuracy with the American Journal of Neuroradiology's (AJNR) "Case of the Month".放射学中的人工智能(AI):深入探讨ChatGPT 4.0与《美国神经放射学杂志》(AJNR)“月度病例”的准确性。
Cureus. 2023 Aug 23;15(8):e43958. doi: 10.7759/cureus.43958. eCollection 2023 Aug.
9
Artificial Intelligence in Medical Education: Comparative Analysis of ChatGPT, Bing, and Medical Students in Germany.医学教育中的人工智能:德国ChatGPT、必应与医学生的比较分析
JMIR Med Educ. 2023 Sep 4;9:e46482. doi: 10.2196/46482.
10
Comparative Performance of ChatGPT and Bard in a Text-Based Radiology Knowledge Assessment.ChatGPT 和 Bard 在基于文本的放射学知识评估中的比较性能。
Can Assoc Radiol J. 2024 May;75(2):344-350. doi: 10.1177/08465371231193716. Epub 2023 Aug 14.