评估GPT-4V和Gemini Pro在日本国家牙科考试中的图像识别能力。

Evaluating the image recognition capabilities of GPT-4V and Gemini Pro in the Japanese national dental examination.

作者信息

Fukuda Hikaru, Morishita Masaki, Muraoka Kosuke, Yamaguchi Shino, Nakamura Taiji, Yoshioka Izumi, Awano Shuji, Ono Kentaro

机构信息

Division of Maxillofacial Surgery, Department of Science of Physical Functions, Kyushu Dental University, Kitakyushu, Japan.

Health Information Management Office, Kyushu Dental University Hospital, Kitakyushu, Japan.

出版信息

J Dent Sci. 2025 Jan;20(1):368-372. doi: 10.1016/j.jds.2024.06.015. Epub 2024 Jul 2.

DOI:10.1016/j.jds.2024.06.015

PMID:39873040

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11762652/

Abstract

BACKGROUND/PURPOSE: OpenAI's GPT-4V and Google's Gemini Pro, being Large Language Models (LLMs) equipped with image recognition capabilities, have the potential to be utilized in future medical diagnosis and treatment, ands serve as valuable educational support tools for students. This study compared and evaluated the image recognition capabilities of GPT-4V and Gemini Pro using questions from the Japanese National Dental Examination (JNDE) to investigate their potential as educational support tools.

MATERIALS AND METHODS

We analyzed 160 questions from the 116th JNDE, administered in March 2023, using ChatGPT-4V, and Gemini Pro, which have image recognition functions. Standardized prompts were used for all LLMs, and statistical analysis was conducted using Fisher's exact test and the Mann-Whitney U test.

RESULTS

For the 160 JNDE questions, the accuracy rates of GPT-4V and Gemini Pro were 35.0% and 28.1%, respectively, with GPT-4V being the highest, although not statistically significant. Across dental specialties, the accuracy rates of the GPT-4V were generally higher than those of the Gemini Pro, with some areas showing equal accuracy. Accuracy rates tended to decrease with an increased number of images within a question, suggesting that the number of images influenced the correctness of the responses.

CONCLUSION

The overall superior performance of GPT-4V compared to Gemini Pro may be attributed to the continuous updates in OpenAI's model. This research demonstrates the potential of LLMs as educational support tools in dentistry, while also highlighting areas that require further technological development.

摘要

背景/目的：OpenAI的GPT-4V和谷歌的Gemini Pro作为具备图像识别能力的大语言模型，有潜力应用于未来的医学诊断和治疗，并为学生提供有价值的教育支持工具。本研究使用日本国家牙科考试（JNDE）的问题，比较并评估了GPT-4V和Gemini Pro的图像识别能力，以探究它们作为教育支持工具的潜力。

材料与方法

我们使用具有图像识别功能的ChatGPT-4V和Gemini Pro，分析了2023年3月举行的第116次JNDE中的160个问题。所有大语言模型均使用标准化提示，并使用Fisher精确检验和Mann-Whitney U检验进行统计分析。

结果

对于160个JNDE问题，GPT-4V和Gemini Pro的准确率分别为35.0%和28.1%，其中GPT-4V的准确率最高，尽管无统计学意义。在各个牙科专业中，GPT-4V的准确率普遍高于Gemini Pro，在某些领域两者准确率相同。随着问题中图像数量的增加，准确率往往会下降，这表明图像数量会影响回答的正确性。

结论

GPT-4V总体表现优于Gemini Pro，这可能归因于OpenAI模型的持续更新。本研究证明了大语言模型在牙科教育支持工具方面的潜力，同时也突出了需要进一步技术发展的领域。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

评估GPT-4V和Gemini Pro在日本国家牙科考试中的图像识别能力。

Evaluating the image recognition capabilities of GPT-4V and Gemini Pro in the Japanese national dental examination.

作者信息

机构信息

出版信息

MATERIALS AND METHODS

RESULTS

CONCLUSION

材料与方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

评估GPT-4V和Gemini Pro在日本国家牙科考试中的图像识别能力。

Evaluating the image recognition capabilities of GPT-4V and Gemini Pro in the Japanese national dental examination.

作者信息

机构信息

出版信息

MATERIALS AND METHODS

RESULTS

CONCLUSION

材料与方法

结果

结论

相似文献

引用本文的文献

本文引用的文献