• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

谁最了解解剖学?ChatGPT-4o、DeepSeek、Gemini和Claude的比较研究

Who Knows Anatomy Best? A Comparative Study of ChatGPT-4o, DeepSeek, Gemini, and Claude.

作者信息

Tassoker Melek

机构信息

Department of Dentomaxillofacial Radiology, Faculty of Dentistry, Necmettin Erbakan University, Konya, Turkey.

出版信息

Clin Anat. 2025 Jul 24. doi: 10.1002/ca.70012.

DOI:10.1002/ca.70012
PMID:40708277
Abstract

This study evaluates the performance of ChatGPT-4o (OpenAI), DeepSeek-v3 (DeepSeek), Gemini 2.0 (Google DeepMind), and Claude 3.7 Sonnet (Anthropic) in answering anatomy questions from the Turkish Dental Specialty Admission Exam (DUS). The study aims to compare their accuracy, response times, and answer lengths. A total of 74 text-based multiple choice anatomy questions from the Turkish Dental Specialty Admission Exam (DUS) administered between 2012 and 2021 were analyzed in this study. The questions varied in difficulty and included both basic anatomical identification and clinically oriented scenarios, with a majority focusing on head and neck anatomy, followed by thorax, neuroanatomy, and musculoskeletal regions, which are particularly relevant to dental education. The accuracy of answers was evaluated against official sources, and response times and word counts were recorded. Statistical analyses, including the Kruskal-Wallis and Cochran's Q tests, were used to compare performance differences. ChatGPT-4o demonstrated the highest accuracy (98.6%), while the other models achieved the same rate of 89.2%. Gemini produced the fastest responses (mean: 4.47 s), whereas DeepSeek generated the shortest answers and Gemini the longest (p = 0.000). The differences in accuracy, response times, and word count were statistically significant (p < 0.05). ChatGPT-4o outperformed other models in accuracy for DUS anatomy questions, suggesting its superior potential as a tool for dental education. Future research should explore the integration of LLMs into structured learning programs.

摘要

本研究评估了ChatGPT-4o(OpenAI)、DeepSeek-v3(DeepSeek)、Gemini 2.0(谷歌DeepMind)和Claude 3.7 Sonnet(Anthropic)在回答土耳其牙科专业入学考试(DUS)中的解剖学问题时的表现。该研究旨在比较它们的准确性、响应时间和答案长度。本研究分析了2012年至2021年期间进行的土耳其牙科专业入学考试(DUS)中总共74道基于文本的多项选择解剖学问题。这些问题难度各异,包括基本的解剖学识别和临床导向的场景,其中大多数集中在头颈部解剖学,其次是胸部、神经解剖学和肌肉骨骼区域,这些与牙科教育特别相关。根据官方资料评估答案的准确性,并记录响应时间和单词数。使用包括Kruskal-Wallis和Cochran's Q检验在内的统计分析来比较性能差异。ChatGPT-4o表现出最高的准确性(98.6%),而其他模型的准确率均为89.2%。Gemini的响应速度最快(平均:4.47秒),而DeepSeek生成的答案最短,Gemini生成的答案最长(p = 0.000)。准确性、响应时间和单词数的差异具有统计学意义(p < 0.05)。ChatGPT-4o在DUS解剖学问题的准确性方面优于其他模型,表明其作为牙科教育工具具有更大的潜力。未来的研究应探索将大语言模型整合到结构化学习计划中。

相似文献

1
Who Knows Anatomy Best? A Comparative Study of ChatGPT-4o, DeepSeek, Gemini, and Claude.谁最了解解剖学?ChatGPT-4o、DeepSeek、Gemini和Claude的比较研究
Clin Anat. 2025 Jul 24. doi: 10.1002/ca.70012.
2
Accuracy of large language models in generating differential diagnosis from clinical presentation and imaging findings in pediatric cases.大型语言模型根据儿科病例的临床表现和影像学检查结果生成鉴别诊断的准确性。
Pediatr Radiol. 2025 Jul 12. doi: 10.1007/s00247-025-06317-z.
3
Evaluating ChatGPT and DeepSeek in postdural puncture headache management: a comparative study with international consensus guidelines.评估ChatGPT和DeepSeek在硬膜穿刺后头痛管理中的应用:与国际共识指南的对比研究
BMC Neurol. 2025 Jul 1;25(1):264. doi: 10.1186/s12883-025-04280-8.
4
Performance of ChatGPT-4o and Four Open-Source Large Language Models in Generating Diagnoses Based on China's Rare Disease Catalog: Comparative Study.ChatGPT-4o与四个开源大语言模型基于中国罕见病目录生成诊断的性能:比较研究
J Med Internet Res. 2025 Jun 18;27:e69929. doi: 10.2196/69929.
5
Cognitive Domain Assessment of Artificial Intelligence Chatbots: A Comparative Study Between ChatGPT and Gemini's Understanding of Anatomy Education.人工智能聊天机器人的认知领域评估:ChatGPT与Gemini对解剖学教育理解的比较研究
Med Sci Educ. 2025 Feb 15;35(3):1295-1304. doi: 10.1007/s40670-025-02303-0. eCollection 2025 Jun.
6
Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.外周动脉疾病教育中的人工智能:ChatGPT与谷歌Gemini的较量
Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.
7
Comparison of artificial intelligence systems in answering prosthodontics questions from the dental specialty exam in Turkey.土耳其牙科专业考试中人工智能系统回答口腔修复学问题的比较
J Dent Sci. 2025 Jul;20(3):1454-1459. doi: 10.1016/j.jds.2025.01.025. Epub 2025 Jan 31.
8
Comparison of Multiple State-of-the-Art Large Language Models for Patient Education Prior to CT and MRI Examinations.CT和MRI检查前用于患者教育的多种先进大语言模型的比较
J Pers Med. 2025 Jun 5;15(6):235. doi: 10.3390/jpm15060235.
9
Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3.葡萄膜炎中大型语言模型性能的基准测试:ChatGPT-3.5、ChatGPT-4.0、谷歌Gemini和Anthropic Claude3的比较分析
Eye (Lond). 2025 Apr;39(6):1132-1137. doi: 10.1038/s41433-024-03545-9. Epub 2024 Dec 17.
10
Accuracy of ChatGPT-3.5, ChatGPT-4o, Copilot, Gemini, Claude, and Perplexity in advising on lumbosacral radicular pain against clinical practice guidelines: cross-sectional study.ChatGPT-3.5、ChatGPT-4o、Copilot、Gemini、Claude和Perplexity在依据临床实践指南对腰骶神经根性疼痛提供建议方面的准确性:横断面研究
Front Digit Health. 2025 Jun 27;7:1574287. doi: 10.3389/fdgth.2025.1574287. eCollection 2025.