• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

关于Chat GPT和其他大语言模型在牙周病学书面期末考试中的表现的一项试点研究。

A pilot study of the performance of Chat GPT and other large language models on a written final year periodontology exam.

作者信息

Ramlogan Shaun, Raman Vidya, Ramlogan Shayn

机构信息

School of Dentistry, Faculty of Medical Sciences, The University of the West Indies, St Augustine Campus, EWMSC, Champs Fleurs, Trinidad and Tobago.

School of Medicine, Faculty of Medical Sciences, The University of the West Indies, St Augustine Campus, EWMSC, Champs Fleurs, Trinidad and Tobago.

出版信息

BMC Med Educ. 2025 May 19;25(1):727. doi: 10.1186/s12909-025-07195-7.

DOI:10.1186/s12909-025-07195-7
PMID:40389910
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12090576/
Abstract

Large Language Models (LLMs) such as Chat GPT are being increasingly utilized by students in education with reportedly adequate academic responses. Chat GPT is expected to learn and improve with time. Thus, the aim was to longitudinally compare the performance of the current versions of Chat GPT-4/GPT4o with that of final-year DDS students on a written periodontology exam. Other current non-subscription LLMs were also compared to the students. Chat GPT-4, guided by the exam parameters, generated answers as 'Run 1' and 6 months later as as 'Run 2'. Chat GPT-4o generated answers as 'Run 3' at 15 months later. All LLMs and student scripts were marked independently by two periodontology lecturers (Cohen's Kappa value 0.71). 'Run 1' and 'Run 3' generated statistically significantly (p < 0.001) higher mean scores of 78% and 77% compared to the students (60%). The mean scores of Chat GPT-4 and GPT4o were also similar to that of the best student. 'Run 2' performed at the level of the students but underperformed with generalizations, more inaccuracies and incomplete answers compared to 'Run 1' and 'Run 3'. This variability for 'Run 2' may be due to outdated data sources, hallucinations and inherent LLM limitations such as online traffic, availability of datasets and resources. Other non-subscription LLMs such as Claude, DeepSeek, Gemini and Le Chat also produced statistically significantly (p < 0.001) higher scores compared to the students. Claude was the best performing LLM with more comprehensive answers. LLMs such as Chat GPT may provide summaries and model answers in clinical undergraduate periodontology education. However, the result must be interpreted with caution regarding academic accuracy and credibility especially in a health care profession.

摘要

诸如Chat GPT之类的大语言模型(LLMs)在教育领域正越来越多地被学生使用,据报道其能给出足够的学术性回答。预计Chat GPT会随着时间推移不断学习和改进。因此,本研究旨在纵向比较当前版本的Chat GPT - 4/GPT4o与牙科博士(DDS)最后一年学生在牙周病学书面考试中的表现。还将其他当前的非订阅式大语言模型与学生的表现进行了比较。Chat GPT - 4在考试参数的引导下,生成了“运行1”的答案,并在6个月后生成了“运行2”的答案。Chat GPT - 4o在15个月后生成了“运行3”的答案。所有大语言模型和学生的答卷均由两位牙周病学讲师独立评分(科恩卡帕值为0.71)。与学生(60%)相比,“运行1”和“运行3”的平均得分在统计学上显著更高(p < 0.001),分别为78%和77%。Chat GPT - 4和GPT4o的平均得分也与最优秀的学生相似。“运行2”的表现与学生相当,但与“运行1”和“运行3”相比,在进行概括时表现较差,存在更多不准确和不完整的答案。“运行2”的这种变异性可能是由于数据源过时、幻觉以及大语言模型的固有局限性,如网络流量、数据集和资源的可用性等。其他非订阅式大语言模型,如Claude、DeepSeek、Gemini和Le Chat,与学生相比,在统计学上也显著更高(p < 0.001)。Claude是表现最佳的大语言模型,答案更全面。诸如Chat GPT之类的大语言模型可能会在临床本科牙周病学教育中提供总结和标准答案。然而,对于学术准确性和可信度,尤其是在医疗保健专业中,必须谨慎解读结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4a8/12090576/52688ddebdcb/12909_2025_7195_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4a8/12090576/d29c3f2399c4/12909_2025_7195_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4a8/12090576/52688ddebdcb/12909_2025_7195_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4a8/12090576/d29c3f2399c4/12909_2025_7195_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4a8/12090576/52688ddebdcb/12909_2025_7195_Fig2_HTML.jpg

相似文献

1
A pilot study of the performance of Chat GPT and other large language models on a written final year periodontology exam.关于Chat GPT和其他大语言模型在牙周病学书面期末考试中的表现的一项试点研究。
BMC Med Educ. 2025 May 19;25(1):727. doi: 10.1186/s12909-025-07195-7.
2
Performance of three artificial intelligence (AI)-based large language models in standardized testing; implications for AI-assisted dental education.三种基于人工智能(AI)的大语言模型在标准化测试中的表现;对人工智能辅助牙科教育的启示。
J Periodontal Res. 2025 Feb;60(2):121-133. doi: 10.1111/jre.13323. Epub 2024 Jul 18.
3
Benchmarking Vision Capabilities of Large Language Models in Surgical Examination Questions.大型语言模型在外科检查问题中的视觉能力基准测试
J Surg Educ. 2025 Apr;82(4):103442. doi: 10.1016/j.jsurg.2025.103442. Epub 2025 Feb 9.
4
Evaluating the Effectiveness of advanced large language models in medical Knowledge: A Comparative study using Japanese national medical examination.评估先进的大型语言模型在医学知识方面的有效性:使用日本国家医学考试的比较研究。
Int J Med Inform. 2025 Jan;193:105673. doi: 10.1016/j.ijmedinf.2024.105673. Epub 2024 Oct 28.
5
Accuracy and quality of ChatGPT-4o and Google Gemini performance on image-based neurosurgery board questions.ChatGPT-4o和谷歌Gemini在基于图像的神经外科委员会问题上的表现准确性和质量。
Neurosurg Rev. 2025 Mar 25;48(1):320. doi: 10.1007/s10143-025-03472-7.
6
Evaluating the Accuracy and Reliability of Large Language Models (ChatGPT, Claude, DeepSeek, Gemini, Grok, and Le Chat) in Answering Item-Analyzed Multiple-Choice Questions on Blood Physiology.评估大语言模型(ChatGPT、Claude、DeepSeek、Gemini、Grok和Le Chat)在回答关于血液生理学的项目分析多项选择题时的准确性和可靠性。
Cureus. 2025 Apr 8;17(4):e81871. doi: 10.7759/cureus.81871. eCollection 2025 Apr.
7
Large language models in periodontology: Assessing their performance in clinically relevant questions.牙周病学中的大语言模型:评估它们在临床相关问题中的表现。
J Prosthet Dent. 2024 Nov 18. doi: 10.1016/j.prosdent.2024.10.020.
8
Large Language Models in Biochemistry Education: Comparative Evaluation of Performance.生物化学教育中的大语言模型:性能的比较评估
JMIR Med Educ. 2025 Apr 10;11:e67244. doi: 10.2196/67244.
9
Assessing the Accuracy and Reliability of Large Language Models in Psychiatry Using Standardized Multiple-Choice Questions: Cross-Sectional Study.使用标准化多项选择题评估大型语言模型在精神病学中的准确性和可靠性:横断面研究
J Med Internet Res. 2025 May 20;27:e69910. doi: 10.2196/69910.
10
AI in Home Care-Evaluation of Large Language Models for Future Training of Informal Caregivers: Observational Comparative Case Study.家庭护理中的人工智能——对用于未来非正式护理人员培训的大语言模型的评估:观察性比较案例研究
J Med Internet Res. 2025 Apr 28;27:e70703. doi: 10.2196/70703.

本文引用的文献

1
A systematic review of large language models and their implications in medical education.大型语言模型及其在医学教育中的应用的系统评价。
Med Educ. 2024 Nov;58(11):1276-1285. doi: 10.1111/medu.15402. Epub 2024 Apr 19.
2
The Role of Chatbot GPT Technology in Undergraduate Dental Education.聊天机器人GPT技术在本科牙科教育中的作用。
Cureus. 2024 Feb 14;16(2):e54193. doi: 10.7759/cureus.54193. eCollection 2024 Feb.
3
Reshaping medical education: Performance of ChatGPT on a PES medical examination.重塑医学教育:ChatGPT 在 PES 医学考试中的表现。
Cardiol J. 2024;31(3):442-450. doi: 10.5603/cj.97517. Epub 2023 Oct 13.
4
Generative AI (gAI) in medical education: Chat-GPT and co.医学教育中的生成式人工智能(gAI):Chat-GPT 及其他类似技术
GMS J Med Educ. 2023 Jun 15;40(4):Doc54. doi: 10.3205/zma001636. eCollection 2023.
5
ChatGPT-A double-edged sword for healthcare education? Implications for assessments of dental students.ChatGPT——医学教育的双刃剑?对牙科学生评估的影响。
Eur J Dent Educ. 2024 Feb;28(1):206-211. doi: 10.1111/eje.12937. Epub 2023 Aug 7.
6
The Potential Usefulness of ChatGPT in Oral and Maxillofacial Radiology.ChatGPT在口腔颌面放射学中的潜在用途
Cureus. 2023 Jul 19;15(7):e42133. doi: 10.7759/cureus.42133. eCollection 2023 Jul.
7
An Esthetic Approach for Rehabilitation of Long-Span Edentulous Arch Using Artificial Intelligence.一种使用人工智能修复大跨度无牙颌弓的美学方法。
Cureus. 2023 May 7;15(5):e38683. doi: 10.7759/cureus.38683. eCollection 2023 May.
8
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试(USMLE)中的表现如何?大语言模型对医学教育和知识评估的影响。
JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.
9
Periodontitis: Consensus report of workgroup 2 of the 2017 World Workshop on the Classification of Periodontal and Peri-Implant Diseases and Conditions.牙周炎:2017 年牙周病和种植体周围疾病分类世界研讨会工作组 2 的共识报告。
J Periodontol. 2018 Jun;89 Suppl 1:S173-S182. doi: 10.1002/JPER.17-0721.