评估ChatGPT不同版本为牙科学生和专业人员提供的有关创伤性牙损伤的信息。

Evaluation of Information Provided by ChatGPT Versions on Traumatic Dental Injuries for Dental Students and Professionals.

作者信息

Öztürk Zeynep, Bal Cenkhan, Çelikkaya Beyza Nur

机构信息

Department of Pediatric Dentistry, Dentistry Faculty, Bolu Abant İzzet Baysal University, Bolu, Turkey.

Department of Pediatric Dentistry, Gülhane Dentistry Faculty, Health Sciences University, Ankara, Turkey.

出版信息

Dent Traumatol. 2025 Aug;41(4):427-436. doi: 10.1111/edt.13042. Epub 2025 Jan 23.

DOI:10.1111/edt.13042

PMID:39853713

Abstract

BACKGROUND/AIM: The use of AI-driven chatbots for accessing medical information is increasingly popular among educators and students. This study aims to assess two different ChatGPT models-ChatGPT 3.5 and ChatGPT 4.0-regarding their responses to queries about traumatic dental injuries, specifically for dental students and professionals.

MATERIAL AND METHODS

A total of 40 questions were prepared, divided equally between those concerning definitions and diagnosis and those on treatment and follow-up. The responses from both ChatGPT versions were evaluated on several criteria: quality, reliability, similarity, and readability. These evaluations were conducted using the Global Quality Scale (GQS), the Reliability Scoring System (adapted DISCERN), the Flesch Reading Ease Score (FRES), the Flesch-Kincaid Reading Grade Level (FKRGL), and the Similarity Index. Normality was checked with the Shapiro-Wilk test, and variance homogeneity was assessed using the Levene test.

RESULTS

The analysis revealed that ChatGPT 3.5 provided more original responses compared to ChatGPT 4.0. According to FRES scores, both versions were challenging to read, with ChatGPT 3.5 having a higher FRES score (39.732 ± 9.713) than ChatGPT 4.0 (34.813 ± 9.356), indicating relatively better readability. There were no significant differences between the ChatGPT versions regarding GQS, DISCERN, and FKRGL scores. However, in the definition and diagnosis section, ChatGPT 4.0 had a statistically higher quality score than ChatGPT 3.5. In contrast, ChatGPT 3.5 provided more original answers in the treatment and follow-up section. For ChatGPT 4.0, the readability and similarity rates for the definition and diagnosis section were higher than those for the treatment and follow-up section. No significant differences were observed between ChatGPT 3.5's DISCERN, FRES, FKRGL, and similarity index measurements by topic.

CONCLUSIONS

Both ChatGPT versions offer high-quality and original information, though they present challenges in readability and reliability. They are valuable resources for dental students and professionals but should be used in conjunction with additional sources of information for a comprehensive understanding.

摘要

背景/目的：在教育工作者和学生中，使用人工智能驱动的聊天机器人获取医学信息越来越普遍。本研究旨在评估两种不同的ChatGPT模型——ChatGPT 3.5和ChatGPT 4.0——对有关牙外伤问题的回答，特别是针对牙科学生和专业人员的回答。

材料与方法

共准备了40个问题，在关于定义和诊断的问题与关于治疗和随访的问题之间平均分配。根据几个标准对两个ChatGPT版本的回答进行评估：质量、可靠性、相似度和可读性。这些评估使用全球质量量表（GQS）、可靠性评分系统（改编的DISCERN）、弗莱什易读性分数（FRES）、弗莱什-金凯德阅读年级水平（FKRGL）和相似度指数进行。用夏皮罗-威尔克检验检查正态性，用莱文检验评估方差齐性。

结果

分析表明，与ChatGPT 4.0相比，ChatGPT 3.5提供了更多原创回答。根据FRES分数，两个版本的回答都难以阅读，ChatGPT 3.5的FRES分数（39.732±9.713）高于ChatGPT 4.0（34.813±9.356），表明其可读性相对较好。在GQS、DISCERN和FKRGL分数方面，ChatGPT版本之间没有显著差异。然而，在定义和诊断部分，ChatGPT 4.0的质量得分在统计学上高于ChatGPT 3.5。相比之下，ChatGPT 3.5在治疗和随访部分提供了更多原创答案。对于ChatGPT 4.0，定义和诊断部分的可读性和相似度率高于治疗和随访部分。ChatGPT 3.5按主题划分的DISCERN、FRES、FKRGL和相似度指数测量之间未观察到显著差异。

结论

两个ChatGPT版本都提供了高质量的原创信息，尽管它们在可读性和可靠性方面存在挑战。它们是牙科学生和专业人员的宝贵资源，但应与其他信息来源结合使用，以获得全面的理解。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

评估ChatGPT不同版本为牙科学生和专业人员提供的有关创伤性牙损伤的信息。

Evaluation of Information Provided by ChatGPT Versions on Traumatic Dental Injuries for Dental Students and Professionals.

作者信息

机构信息

出版信息

MATERIAL AND METHODS

RESULTS

CONCLUSIONS

材料与方法

结果

结论

相似文献

本文引用的文献

评估ChatGPT不同版本为牙科学生和专业人员提供的有关创伤性牙损伤的信息。

Evaluation of Information Provided by ChatGPT Versions on Traumatic Dental Injuries for Dental Students and Professionals.

作者信息

机构信息

出版信息

MATERIAL AND METHODS

RESULTS

CONCLUSIONS

材料与方法

结果

结论

相似文献

本文引用的文献