• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估ChatGPT模型在回答口腔颌面病理学和口腔放射学多项选择题方面的准确性。

Evaluating the accuracy of CHATGPT models in answering multiple-choice questions on oral and maxillofacial pathologies and oral radiology.

作者信息

Felemban Doaa, Jazzar Ahoud, Mair Yasmin, Alsharif Maha, Alsharif Alla, Kassim Saba

机构信息

Department of Oral and Maxillofacial Diagnostic sciences, College of Dentistry, Taibah University, Al-Madinah Al-Munawwarah, Saudi Arabia.

Department of Oral Diagnostic sciences, King Abdulaziz University, Faculty of Dentistry, Jeddah, Saudi Arabia.

出版信息

Digit Health. 2025 Jul 8;11:20552076251355847. doi: 10.1177/20552076251355847. eCollection 2025 Jan-Dec.

DOI:10.1177/20552076251355847
PMID:40656850
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12246668/
Abstract

OBJECTIVE

This study is designed to evaluate the accuracy of ChatGPT models (3.5, 4.0 and 4 Turbo) in answering multiple-choice questions (MCQs) related to oral and maxillofacial pathology and oral radiology, thus, providing reliable information in the field of dentistry.

METHODS

A set of 136 validated MCQs varies between knowledge and cognitive were used in the study. The questions covered different topics related to odontogenic cysts, tumours and bone lesions. Difficulty of the questions was evaluated by two MCQ-item writing, board-certified reviewers in the fields. The questions were entered into Chat GPT-3.5, ChatGPT-4 and ChatGPT-4 Turbo independently. .

RESULTS

Fifty-six percent of the total questions were related to oral radiology, and 66% were categorised as easy. The dataset consisted primarily of questions testing knowledge (87%), with only 13% of questions assessing cognitive skills. ChatGPT-4 Turbo exhibited the highest accuracy, answering 90% of questions correctly, followed by ChatGPT-4.0 with 85% accuracy and ChatGPT-3.5 with 78% accuracy. Only 98 questions (72%) were correctly answered by the three models. Ten months later, the unpaid ChatGPT version showed a significant improvement in accuracy, while the paid versions maintained consistent performance over time with no significant differences.

CONCLUSION

The findings suggest that, while AI can be a helpful tool in dental education, limitations persist that must be addressed, particularly in terms of complex cognitive skills and image-based questions. This study provides valuable insights into the capabilities and potential improvements of AI applications in dental education.

摘要

目的

本研究旨在评估ChatGPT模型(3.5、4.0和4 Turbo)在回答与口腔颌面病理学和口腔放射学相关的多项选择题(MCQ)时的准确性,从而在牙科领域提供可靠信息。

方法

本研究使用了一组136道经验证的MCQ,涵盖知识和认知方面。这些问题涉及与牙源性囊肿、肿瘤和骨病变相关的不同主题。问题的难度由该领域两名经过委员会认证的MCQ项目编写评审员进行评估。这些问题被分别输入Chat GPT-3.5、ChatGPT-4和ChatGPT-4 Turbo。

结果

总问题的56%与口腔放射学相关,66%被归类为简单。数据集主要由测试知识的问题组成(87%),只有13%的问题评估认知技能。ChatGPT-4 Turbo表现出最高的准确率,正确回答了90%的问题,其次是ChatGPT-4.0,准确率为85%,ChatGPT-3.5的准确率为78%。三个模型仅正确回答了98个问题(72%)。十个月后,免费的ChatGPT版本在准确率上有了显著提高,而付费版本随着时间的推移保持了一致的表现,没有显著差异。

结论

研究结果表明,虽然人工智能在牙科教育中可以是一个有用的工具,但仍然存在必须解决的局限性,特别是在复杂认知技能和基于图像的问题方面。本研究为人工智能在牙科教育中的能力和潜在改进提供了有价值的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cae0/12246668/4677a3261b0a/10.1177_20552076251355847-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cae0/12246668/4677a3261b0a/10.1177_20552076251355847-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cae0/12246668/4677a3261b0a/10.1177_20552076251355847-fig1.jpg

相似文献

1
Evaluating the accuracy of CHATGPT models in answering multiple-choice questions on oral and maxillofacial pathologies and oral radiology.评估ChatGPT模型在回答口腔颌面病理学和口腔放射学多项选择题方面的准确性。
Digit Health. 2025 Jul 8;11:20552076251355847. doi: 10.1177/20552076251355847. eCollection 2025 Jan-Dec.
2
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现:系统评价和荟萃分析。
J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.
3
Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.ChatGPT与互联网搜索用于职业医学临床研究和决策的比较:随机对照试验
JMIR Form Res. 2025 May 20;9:e63857. doi: 10.2196/63857.
4
Artificial Intelligence in Orthopaedics: Performance of ChatGPT on Text and Image Questions on a Complete AAOS Orthopaedic In-Training Examination (OITE).人工智能在骨科领域的应用:ChatGPT 在 AAOS 骨科住院医师培训考试(OITE)全题文本和图像问题上的表现。
J Surg Educ. 2024 Nov;81(11):1645-1649. doi: 10.1016/j.jsurg.2024.08.002. Epub 2024 Sep 14.
5
Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.外周动脉疾病教育中的人工智能:ChatGPT与谷歌Gemini的较量
Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.
6
Accuracy and Reliability of Artificial Intelligence Chatbots as Public Information Sources in Implant Dentistry.人工智能聊天机器人作为种植牙科公共信息来源的准确性和可靠性
Int J Oral Maxillofac Implants. 2025 Jun 25;0(0):1-23. doi: 10.11607/jomi.11280.
7
Large Language Models and Empathy: Systematic Review.大语言模型与同理心:系统综述
J Med Internet Res. 2024 Dec 11;26:e52597. doi: 10.2196/52597.
8
Comparative Performance of Medical Students, ChatGPT-3.5 and ChatGPT-4.0 in Answering Questions From a Brazilian National Medical Exam: Cross-Sectional Questionnaire Study.医学生、ChatGPT-3.5和ChatGPT-4.0在回答巴西国家医学考试问题中的表现比较:横断面问卷调查研究
JMIR AI. 2025 May 8;4:e66552. doi: 10.2196/66552.
9
Evaluation of ChatGPT Performance on Emergency Medicine Board Examination Questions: Observational Study.ChatGPT在急诊医学委员会考试问题上的表现评估:观察性研究。
JMIR AI. 2025 Mar 12;4:e67696. doi: 10.2196/67696.
10
Performance of ChatGPT in answering the oral pathology questions of various types or subjects from Taiwan National Dental Licensing Examinations.ChatGPT在回答台湾地区国家牙科执照考试各类题型或主题的口腔病理学问题时的表现。
J Dent Sci. 2025 Jul;20(3):1709-1715. doi: 10.1016/j.jds.2025.03.030. Epub 2025 Apr 5.

本文引用的文献

1
Evaluating the efficacy of leading large language models in the Japanese national dental hygienist examination: A comparative analysis of ChatGPT, Bard, and Bing Chat.评估领先的大语言模型在日本国家牙科保健员考试中的功效:ChatGPT、Bard和必应聊天的比较分析。
J Dent Sci. 2024 Oct;19(4):2262-2267. doi: 10.1016/j.jds.2024.02.019. Epub 2024 Feb 29.
2
ChatGPT's risk of misinformation in dentistry: A comparative follow-up evaluation.ChatGPT在牙科领域传播错误信息的风险:一项比较性随访评估。
J Am Dent Assoc. 2025 Jan;156(1):3-5. doi: 10.1016/j.adaj.2024.05.003. Epub 2024 Jun 15.
3
How well do large language model-based chatbots perform in oral and maxillofacial radiology?
基于大型语言模型的聊天机器人在口腔颌面放射学中的表现如何?
Dentomaxillofac Radiol. 2024 Sep 1;53(6):390-395. doi: 10.1093/dmfr/twae021.
4
Performance of Generative Artificial Intelligence in Dental Licensing Examinations.生成式人工智能在牙科执业考试中的表现。
Int Dent J. 2024 Jun;74(3):616-621. doi: 10.1016/j.identj.2023.12.007. Epub 2024 Jan 19.
5
The performance of ChatGPT on orthopaedic in-service training exams: A comparative study of the GPT-3.5 turbo and GPT-4 models in orthopaedic education.ChatGPT在骨科在职培训考试中的表现:GPT-3.5 turbo和GPT-4模型在骨科教育中的比较研究。
J Orthop. 2023 Nov 23;50:70-75. doi: 10.1016/j.jor.2023.11.056. eCollection 2024 Apr.
6
Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study.评估生成式 AI 大语言模型 ChatGPT、Google Bard 和 Microsoft Bing Chat 在支持循证牙科方面的性能:比较混合方法研究。
J Med Internet Res. 2023 Dec 28;25:e51580. doi: 10.2196/51580.
7
Contemporary Role and Applications of Artificial Intelligence in Dentistry.当代人工智能在牙科中的作用和应用。
F1000Res. 2023 Sep 20;12:1179. doi: 10.12688/f1000research.140204.1. eCollection 2023.
8
The Potential Usefulness of ChatGPT in Oral and Maxillofacial Radiology.ChatGPT在口腔颌面放射学中的潜在用途
Cureus. 2023 Jul 19;15(7):e42133. doi: 10.7759/cureus.42133. eCollection 2023 Jul.
9
Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers.使用检测器和不知情的人类评审员,将ChatGPT生成的科学摘要与真实摘要进行比较。
NPJ Digit Med. 2023 Apr 26;6(1):75. doi: 10.1038/s41746-023-00819-6.
10
ChatGPT: Chances and Challenges for Dentistry.ChatGPT:牙科领域的机遇与挑战。
Compend Contin Educ Dent. 2023 Apr;44(4):220-224.