• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大型语言人工智能模型在解决修复牙科和牙髓学生评估方面的性能。

Performance of large language artificial intelligence models on solving restorative dentistry and endodontics student assessments.

机构信息

Department of Operative, Preventive and Pediatric Dentistry, Charité - Universitätsmedizin Berlin, Aßmannshauser Str. 4-6, Berlin, 14197, Germany.

出版信息

Clin Oral Investig. 2024 Oct 7;28(11):575. doi: 10.1007/s00784-024-05968-w.

DOI:10.1007/s00784-024-05968-w
PMID:39373739
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11458639/
Abstract

OBJECTIVES

The advent of artificial intelligence (AI) and large language model (LLM)-based AI applications (LLMAs) has tremendous implications for our society. This study analyzed the performance of LLMAs on solving restorative dentistry and endodontics (RDE) student assessment questions.

MATERIALS AND METHODS

151 questions from a RDE question pool were prepared for prompting using LLMAs from OpenAI (ChatGPT-3.5,-4.0 and -4.0o) and Google (Gemini 1.0). Multiple-choice questions were sorted into four question subcategories, entered into LLMAs and answers recorded for analysis. P-value and chi-square statistical analyses were performed using Python 3.9.16.

RESULTS

The total answer accuracy of ChatGPT-4.0o was the highest, followed by ChatGPT-4.0, Gemini 1.0 and ChatGPT-3.5 (72%, 62%, 44% and 25%, respectively) with significant differences between all LLMAs except GPT-4.0 models. The performance on subcategories direct restorations and caries was the highest, followed by indirect restorations and endodontics.

CONCLUSIONS

Overall, there are large performance differences among LLMAs. Only the ChatGPT-4 models achieved a success ratio that could be used with caution to support the dental academic curriculum.

CLINICAL RELEVANCE

While LLMAs could support clinicians to answer dental field-related questions, this capacity depends strongly on the employed model. The most performant model ChatGPT-4.0o achieved acceptable accuracy rates in some subject sub-categories analyzed.

摘要

目的

人工智能 (AI) 和基于大型语言模型 (LLM) 的 AI 应用程序 (LLMAs) 的出现对我们的社会具有重大影响。本研究分析了 LLMAs 在解决修复学和牙髓学 (RDE) 学生评估问题方面的性能。

材料与方法

使用 OpenAI(ChatGPT-3.5、-4.0 和 -4.0o)和 Google(Gemini 1.0)的 LLMAs 为 151 个来自 RDE 题库的问题准备提示。将选择题分为四个问题子类别,输入到 LLMAs 中并记录答案进行分析。使用 Python 3.9.16 进行 P 值和卡方统计分析。

结果

ChatGPT-4.0o 的总答案准确率最高,其次是 ChatGPT-4.0、Gemini 1.0 和 ChatGPT-3.5(分别为 72%、62%、44%和 25%),除了 GPT-4.0 模型外,所有 LLMAs 之间均存在显著差异。直接修复和龋齿的表现最高,其次是间接修复和牙髓学。

结论

总体而言,不同的 LLMAs 之间存在较大的性能差异。只有 ChatGPT-4 模型的成功率可以谨慎使用,以支持牙科学术课程。

临床相关性

虽然 LLMAs 可以支持临床医生回答与牙科领域相关的问题,但这种能力强烈依赖于所使用的模型。表现最出色的模型 ChatGPT-4.0o 在分析的一些学科子类别中达到了可接受的准确率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2125/11458639/8ed1912dbf53/784_2024_5968_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2125/11458639/679c477df3ad/784_2024_5968_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2125/11458639/82362fee1aeb/784_2024_5968_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2125/11458639/23cf3a693930/784_2024_5968_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2125/11458639/f2af2704d31e/784_2024_5968_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2125/11458639/a82a6018b52e/784_2024_5968_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2125/11458639/8ed1912dbf53/784_2024_5968_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2125/11458639/679c477df3ad/784_2024_5968_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2125/11458639/82362fee1aeb/784_2024_5968_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2125/11458639/23cf3a693930/784_2024_5968_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2125/11458639/f2af2704d31e/784_2024_5968_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2125/11458639/a82a6018b52e/784_2024_5968_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2125/11458639/8ed1912dbf53/784_2024_5968_Fig6_HTML.jpg

相似文献

1
Performance of large language artificial intelligence models on solving restorative dentistry and endodontics student assessments.大型语言人工智能模型在解决修复牙科和牙髓学生评估方面的性能。
Clin Oral Investig. 2024 Oct 7;28(11):575. doi: 10.1007/s00784-024-05968-w.
2
Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o.人工智能模型在风湿病委员会级问题中的比较性能:评估 Google Gemini 和 ChatGPT-4o。
Clin Rheumatol. 2024 Nov;43(11):3507-3513. doi: 10.1007/s10067-024-07154-5. Epub 2024 Sep 28.
3
Comparative analysis of diagnostic accuracy in endodontic assessments: dental students vs. artificial intelligence.牙髓评估诊断准确性的对比分析:牙科学学生与人工智能。
Diagnosis (Berl). 2024 May 3;11(3):259-265. doi: 10.1515/dx-2024-0034. eCollection 2024 Aug 1.
4
Evaluating Large Language Models in Dental Anesthesiology: A Comparative Analysis of ChatGPT-4, Claude 3 Opus, and Gemini 1.0 on the Japanese Dental Society of Anesthesiology Board Certification Exam.评估牙科麻醉学中的大语言模型:ChatGPT-4、Claude 3 Opus和Gemini 1.0在日本麻醉学牙科协会委员会认证考试中的比较分析。
Cureus. 2024 Sep 27;16(9):e70302. doi: 10.7759/cureus.70302. eCollection 2024 Sep.
5
Performance of three artificial intelligence (AI)-based large language models in standardized testing; implications for AI-assisted dental education.三种基于人工智能(AI)的大语言模型在标准化测试中的表现;对人工智能辅助牙科教育的启示。
J Periodontal Res. 2025 Feb;60(2):121-133. doi: 10.1111/jre.13323. Epub 2024 Jul 18.
6
Performance of Generative Artificial Intelligence in Dental Licensing Examinations.生成式人工智能在牙科执业考试中的表现。
Int Dent J. 2024 Jun;74(3):616-621. doi: 10.1016/j.identj.2023.12.007. Epub 2024 Jan 19.
7
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现:系统评价和荟萃分析。
J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.
8
Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study.ChatGPT-4、微软 Copilot 和谷歌 Gemini 在意大利医疗科学学位入学考试中的比较准确性:一项横断面研究。
BMC Med Educ. 2024 Jun 26;24(1):694. doi: 10.1186/s12909-024-05630-9.
9
Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study.评估生成式 AI 大语言模型 ChatGPT、Google Bard 和 Microsoft Bing Chat 在支持循证牙科方面的性能:比较混合方法研究。
J Med Internet Res. 2023 Dec 28;25:e51580. doi: 10.2196/51580.
10
Evaluation of ChatGPT's Real-Life Implementation in Undergraduate Dental Education: Mixed Methods Study.评价 ChatGPT 在本科牙科教育中的实际应用:混合方法研究。
JMIR Med Educ. 2024 Jan 31;10:e51344. doi: 10.2196/51344.

引用本文的文献

1
Evaluation of Chatbot Responses to Text-Based Multiple-Choice Questions in Prosthodontic and Restorative Dentistry.口腔修复学和牙体修复学中聊天机器人对基于文本的多项选择题的回答评估
Dent J (Basel). 2025 Jun 21;13(7):279. doi: 10.3390/dj13070279.
2
Machine learning in dentistry and oral surgery: charting the course with bibliometric insights.牙科与口腔外科中的机器学习:基于文献计量学见解绘制发展路径
Head Face Med. 2025 Jun 4;21(1):44. doi: 10.1186/s13005-025-00521-w.
3
Assessing ChatGPT-4's performance on the US prosthodontic exam: impact of fine-tuning and contextual prompting vs. base knowledge, a cross-sectional study.

本文引用的文献

1
Basal knowledge in the field of pediatric nephrology and its enhancement following specific training of ChatGPT-4 "omni" and Gemini 1.5 Flash.儿科肾脏病学领域的基础知识及其在 ChatGPT-4“全能”和 Gemini 1.5 Flash 特定培训后的增强。
Pediatr Nephrol. 2025 Jan;40(1):151-157. doi: 10.1007/s00467-024-06486-3. Epub 2024 Aug 16.
2
Evaluating the performance of ChatGPT-3.5 and ChatGPT-4 on the Taiwan plastic surgery board examination.评估ChatGPT-3.5和ChatGPT-4在台湾整形外科医师资格考试中的表现。
Heliyon. 2024 Jul 18;10(14):e34851. doi: 10.1016/j.heliyon.2024.e34851. eCollection 2024 Jul 30.
3
Estimating the use of ChatGPT in dental research publications.
评估ChatGPT-4在美国口腔修复学考试中的表现:微调与情境提示对比基础知识的影响,一项横断面研究
BMC Med Educ. 2025 May 23;25(1):761. doi: 10.1186/s12909-025-07371-9.
4
Assessment of various artificial intelligence applications in responding to technical questions in endodontic surgery.评估各种人工智能应用在应对牙髓外科技术问题方面的表现。
BMC Oral Health. 2025 May 22;25(1):763. doi: 10.1186/s12903-025-06149-1.
5
Can Artificial Intelligence Language Models Effectively Address Dental Trauma Questions?人工智能语言模型能否有效解决牙齿创伤问题?
Dent Traumatol. 2025 Apr 1. doi: 10.1111/edt.13063.
6
Evaluating the Accuracy, Reliability, Consistency, and Readability of Different Large Language Models in Restorative Dentistry.评估不同大语言模型在口腔修复学中的准确性、可靠性、一致性和可读性。
J Esthet Restor Dent. 2025 Jul;37(7):1740-1752. doi: 10.1111/jerd.13447. Epub 2025 Mar 2.
7
Assessing the ability of GPT-4o to visually recognize medications and provide patient education.评估 GPT-4o 视觉识别药物并提供患者教育的能力。
Sci Rep. 2024 Nov 5;14(1):26749. doi: 10.1038/s41598-024-78577-y.
估算 ChatGPT 在牙科研究文献中的使用情况。
J Dent. 2024 Oct;149:105275. doi: 10.1016/j.jdent.2024.105275. Epub 2024 Jul 30.
4
ChatGPT-4o passes Part 1 of ORE.ChatGPT-4o通过了验光师注册考试的第一部分。
Br Dent J. 2024 Jul;237(2):72-73. doi: 10.1038/s41415-024-7700-5. Epub 2024 Jul 26.
5
Experiences of Joint Dental Training Programme in East of England.英格兰东部联合牙科培训项目的经验
Br Dent J. 2024 Jul;237(2):71-72. doi: 10.1038/s41415-024-7698-8. Epub 2024 Jul 26.
6
Is ChatGPT an Accurate and Readable Patient Aid for Third Molar Extractions?ChatGPT 能否成为智齿拔除患者的准确且易读的辅助工具?
J Oral Maxillofac Surg. 2024 Oct;82(10):1239-1245. doi: 10.1016/j.joms.2024.06.177. Epub 2024 Jul 2.
7
Evaluating the accuracy of Chat Generative Pre-trained Transformer version 4 (ChatGPT-4) responses to United States Food and Drug Administration (FDA) frequently asked questions about dental amalgam.评估 Chat Generative Pre-trained Transformer 版本 4(ChatGPT-4)对美国食品和药物管理局(FDA)关于牙银合金常见问题的回答的准确性。
BMC Oral Health. 2024 May 24;24(1):605. doi: 10.1186/s12903-024-04358-8.
8
Root Canal Instrumentation: Current Trends and Future Perspectives.根管治疗术:当前趋势与未来展望
Cureus. 2024 Apr 11;16(4):e58045. doi: 10.7759/cureus.58045. eCollection 2024 Apr.
9
Role of ChatGPT in Academia: Dental Students' Perspectives.ChatGPT在学术界的作用:牙科学生的观点。
Prim Dent J. 2024 Mar;13(1):89-90. doi: 10.1177/20501684241230191.
10
Is ChatGPT making scientists hyper-productive? The highs and lows of using AI.ChatGPT 让科学家们变得超级高效吗?使用人工智能的得与失。
Nature. 2024 Mar;627(8002):16-17. doi: 10.1038/d41586-024-00592-w.