ChatGPT 在牙科和过敏免疫评估中的表现：一项比较研究。

ChatGPT's performance in dentistry and allergyimmunology assessments: a comparative study.

机构信息

Department of Periodontology, Endodontology and Cariology, University Center for Dental Medicine Basel UZB, University of Basel, Basel, Switzerland.

Division of Allergy, University Children's Hospital Basel, Basel, Switzerland.

出版信息

Swiss Dent J. 2023 Oct 4;134(2):1-17. doi: 10.61872/sdj-2024-06-01.

DOI:10.61872/sdj-2024-06-01

PMID:38726506

Abstract

Large language models (LLMs) such as ChatGPT have potential applications in healthcare, including dentistry. Priming, the practice of providing LLMs with initial, relevant information, is an approach to improve their output quality. This study aimed to evaluate the performance of ChatGPT 3 and ChatGPT 4 on self-assessment questions for dentistry, through the Swiss Federal Licensing Examination in Dental Medicine (SFLEDM), and allergy and clinical immunology, through the European Examination in Allergy and Clinical Immunology (EEAACI). The second objective was to assess the impact of priming on ChatGPT's performance. The SFLEDM and EEAACI multiple-choice questions from the University of Bern's Institute for Medical Education platform were administered to both ChatGPT versions, with and without priming. Performance was analyzed based on correct responses. The statistical analysis included Wilcoxon rank sum tests (alpha=0.05). The average accuracy rates in the SFLEDM and EEAACI assessments were 63.3% and 79.3%, respectively. Both ChatGPT versions performed better on EEAACI than SFLEDM, with ChatGPT 4 outperforming ChatGPT 3 across all tests. ChatGPT 3's performance exhibited a significant improvement with priming for both EEAACI (p=0.017) and SFLEDM (p=0.024) assessments. For ChatGPT 4, the priming effect was significant only in the SFLEDM assessment (p=0.038). The performance disparity between SFLEDM and EEAACI assessments underscores ChatGPT's varying proficiency across different medical domains, likely tied to the nature and amount of training data available in each field. Priming can be a tool for enhancing output, especially in earlier LLMs. Advancements from ChatGPT 3 to 4 highlight the rapid developments in LLM technology. Yet, their use in critical fields such as healthcare must remain cautious owing to LLMs' inherent limitations and risks.

摘要

大型语言模型（LLMs），如 ChatGPT，在医疗保健领域，包括牙科领域，具有潜在的应用。提示，即向 LLM 提供初始相关信息的做法，是一种提高其输出质量的方法。本研究旨在通过瑞士联邦牙医执照考试（SFLEDM）和过敏与临床免疫学欧洲考试（EEAACI）评估 ChatGPT 3 和 ChatGPT 4 在牙科自我评估问题上的表现，以及评估过敏与临床免疫学。第二个目标是评估提示对 ChatGPT 性能的影响。伯尔尼大学医学教育研究所的 SFLEDM 和 EEAACI 选择题分别提供给两个 ChatGPT 版本，包括和不包括提示。性能基于正确答案进行分析。统计分析包括 Wilcoxon 秩和检验（alpha=0.05）。在 SFLEDM 和 EEAACI 评估中，平均准确率分别为 63.3%和 79.3%。两个 ChatGPT 版本在 EEAACI 上的表现均优于 SFLEDM，ChatGPT 4 在所有测试中均优于 ChatGPT 3。ChatGPT 3 在 EEAACI（p=0.017）和 SFLEDM（p=0.024）评估中的表现均因提示而显著提高。对于 ChatGPT 4，提示效果仅在 SFLEDM 评估中显著（p=0.038）。SFLEDM 和 EEAACI 评估之间的性能差异突出了 ChatGPT 在不同医学领域的不同熟练程度，这可能与每个领域可用的训练数据的性质和数量有关。提示是增强输出的一种工具，尤其是在早期的 LLM 中。从 ChatGPT 3 到 4 的进步突出了 LLM 技术的快速发展。然而，由于 LLM 的固有局限性和风险，它们在医疗保健等关键领域的使用必须保持谨慎。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

ChatGPT 在牙科和过敏免疫评估中的表现：一项比较研究。

ChatGPT's performance in dentistry and allergyimmunology assessments: a comparative study.

机构信息

出版信息

相似文献

引用本文的文献

ChatGPT 在牙科和过敏免疫评估中的表现：一项比较研究。

ChatGPT's performance in dentistry and allergyimmunology assessments: a comparative study.

机构信息

出版信息

相似文献

引用本文的文献