• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估GPT-4V和Gemini Pro在日本国家牙科考试中的图像识别能力。

Evaluating the image recognition capabilities of GPT-4V and Gemini Pro in the Japanese national dental examination.

作者信息

Fukuda Hikaru, Morishita Masaki, Muraoka Kosuke, Yamaguchi Shino, Nakamura Taiji, Yoshioka Izumi, Awano Shuji, Ono Kentaro

机构信息

Division of Maxillofacial Surgery, Department of Science of Physical Functions, Kyushu Dental University, Kitakyushu, Japan.

Health Information Management Office, Kyushu Dental University Hospital, Kitakyushu, Japan.

出版信息

J Dent Sci. 2025 Jan;20(1):368-372. doi: 10.1016/j.jds.2024.06.015. Epub 2024 Jul 2.

DOI:10.1016/j.jds.2024.06.015
PMID:39873040
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11762652/
Abstract

BACKGROUND/PURPOSE: OpenAI's GPT-4V and Google's Gemini Pro, being Large Language Models (LLMs) equipped with image recognition capabilities, have the potential to be utilized in future medical diagnosis and treatment, ands serve as valuable educational support tools for students. This study compared and evaluated the image recognition capabilities of GPT-4V and Gemini Pro using questions from the Japanese National Dental Examination (JNDE) to investigate their potential as educational support tools.

MATERIALS AND METHODS

We analyzed 160 questions from the 116th JNDE, administered in March 2023, using ChatGPT-4V, and Gemini Pro, which have image recognition functions. Standardized prompts were used for all LLMs, and statistical analysis was conducted using Fisher's exact test and the Mann-Whitney U test.

RESULTS

For the 160 JNDE questions, the accuracy rates of GPT-4V and Gemini Pro were 35.0% and 28.1%, respectively, with GPT-4V being the highest, although not statistically significant. Across dental specialties, the accuracy rates of the GPT-4V were generally higher than those of the Gemini Pro, with some areas showing equal accuracy. Accuracy rates tended to decrease with an increased number of images within a question, suggesting that the number of images influenced the correctness of the responses.

CONCLUSION

The overall superior performance of GPT-4V compared to Gemini Pro may be attributed to the continuous updates in OpenAI's model. This research demonstrates the potential of LLMs as educational support tools in dentistry, while also highlighting areas that require further technological development.

摘要

背景/目的:OpenAI的GPT-4V和谷歌的Gemini Pro作为具备图像识别能力的大语言模型,有潜力应用于未来的医学诊断和治疗,并为学生提供有价值的教育支持工具。本研究使用日本国家牙科考试(JNDE)的问题,比较并评估了GPT-4V和Gemini Pro的图像识别能力,以探究它们作为教育支持工具的潜力。

材料与方法

我们使用具有图像识别功能的ChatGPT-4V和Gemini Pro,分析了2023年3月举行的第116次JNDE中的160个问题。所有大语言模型均使用标准化提示,并使用Fisher精确检验和Mann-Whitney U检验进行统计分析。

结果

对于160个JNDE问题,GPT-4V和Gemini Pro的准确率分别为35.0%和28.1%,其中GPT-4V的准确率最高,尽管无统计学意义。在各个牙科专业中,GPT-4V的准确率普遍高于Gemini Pro,在某些领域两者准确率相同。随着问题中图像数量的增加,准确率往往会下降,这表明图像数量会影响回答的正确性。

结论

GPT-4V总体表现优于Gemini Pro,这可能归因于OpenAI模型的持续更新。本研究证明了大语言模型在牙科教育支持工具方面的潜力,同时也突出了需要进一步技术发展的领域。

相似文献

1
Evaluating the image recognition capabilities of GPT-4V and Gemini Pro in the Japanese national dental examination.评估GPT-4V和Gemini Pro在日本国家牙科考试中的图像识别能力。
J Dent Sci. 2025 Jan;20(1):368-372. doi: 10.1016/j.jds.2024.06.015. Epub 2024 Jul 2.
2
Evaluating the efficacy of leading large language models in the Japanese national dental hygienist examination: A comparative analysis of ChatGPT, Bard, and Bing Chat.评估领先的大语言模型在日本国家牙科保健员考试中的功效:ChatGPT、Bard和必应聊天的比较分析。
J Dent Sci. 2024 Oct;19(4):2262-2267. doi: 10.1016/j.jds.2024.02.019. Epub 2024 Feb 29.
3
Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study.在医学视觉问答中评估Bard Gemini Pro和GPT-4 Vision对学生表现的影响:比较案例研究
JMIR Form Res. 2024 Dec 17;8:e57592. doi: 10.2196/57592.
4
Evaluating GPT-4V's performance in the Japanese national dental examination: A challenge explored.评估GPT-4V在日本国家牙科考试中的表现:一项探索性挑战。
J Dent Sci. 2024 Jul;19(3):1595-1600. doi: 10.1016/j.jds.2023.12.007. Epub 2023 Dec 22.
5
Comparing Diagnostic Accuracy of Radiologists versus GPT-4V and Gemini Pro Vision Using Image Inputs from Diagnosis Please Cases.比较放射科医生与 GPT-4V 和 Gemini Pro Vision 使用诊断请案例的图像输入的诊断准确性。
Radiology. 2024 Jul;312(1):e240273. doi: 10.1148/radiol.240273.
6
Performance of GPT-4V in Answering the Japanese Otolaryngology Board Certification Examination Questions: Evaluation Study.GPT-4V 在回答日本耳鼻喉科学委员会认证考试问题方面的表现:评估研究。
JMIR Med Educ. 2024 Mar 28;10:e57054. doi: 10.2196/57054.
7
The Performance of GPT-3.5, GPT-4, and Bard on the Japanese National Dentist Examination: A Comparison Study.GPT-3.5、GPT-4和Bard在日本国家牙科医师考试中的表现:一项比较研究。
Cureus. 2023 Dec 12;15(12):e50369. doi: 10.7759/cureus.50369. eCollection 2023 Dec.
8
Evaluating the Effectiveness of advanced large language models in medical Knowledge: A Comparative study using Japanese national medical examination.评估先进的大型语言模型在医学知识方面的有效性:使用日本国家医学考试的比较研究。
Int J Med Inform. 2025 Jan;193:105673. doi: 10.1016/j.ijmedinf.2024.105673. Epub 2024 Oct 28.
9
Performance of three artificial intelligence (AI)-based large language models in standardized testing; implications for AI-assisted dental education.三种基于人工智能(AI)的大语言模型在标准化测试中的表现;对人工智能辅助牙科教育的启示。
J Periodontal Res. 2025 Feb;60(2):121-133. doi: 10.1111/jre.13323. Epub 2024 Jul 18.
10
Unveiling GPT-4V's hidden challenges behind high accuracy on USMLE questions: Observational Study.揭示GPT-4V在美国医师执照考试(USMLE)问题上高精度背后的隐藏挑战:观察性研究。
J Med Internet Res. 2025 Feb 7;27:e65146. doi: 10.2196/65146.

引用本文的文献

1
Performance of ChatGPT in answering the oral pathology questions of various types or subjects from Taiwan National Dental Licensing Examinations.ChatGPT在回答台湾地区国家牙科执照考试各类题型或主题的口腔病理学问题时的表现。
J Dent Sci. 2025 Jul;20(3):1709-1715. doi: 10.1016/j.jds.2025.03.030. Epub 2025 Apr 5.

本文引用的文献

1
Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study.GPT-3.5和GPT-4在日本医师执照考试中的表现:比较研究。
JMIR Med Educ. 2023 Jun 29;9:e48002. doi: 10.2196/48002.
2
Performance of the Large Language Model ChatGPT on the National Nurse Examinations in Japan: Evaluation Study.大型语言模型ChatGPT在日本国家护士考试中的表现:评估研究
JMIR Nurs. 2023 Jun 27;6:e47305. doi: 10.2196/47305.
3
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.
ChatGPT在美国医师执照考试(USMLE)中的表现如何?大语言模型对医学教育和知识评估的影响。
JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.
4
Deep learning.深度学习。
Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.