Suppr超能文献

Comparative Analysis of ChatGPT-3.5 and GPT-4 in Open-Ended Clinical Reasoning Across Dental Specialties.

作者信息

Babaee Hemmati Yasamin, Rasouli Morteza, Falahchai Mehran

机构信息

Department of Orthodontics, Dental Sciences Research Center, School of Dentistry, Guilan University of Medical Sciences, Rasht, Iran.

School of Dentistry, Guilan University of Medical Sciences, Rasht, Iran.

出版信息

Eur J Dent Educ. 2025 Jun 13. doi: 10.1111/eje.13144.

Abstract

PURPOSE

The integration of large language models (LLMs) such as ChatGPT into health care has garnered increasing interest. While previous studies have assessed these models using structured multiple-choice questions, limited research has evaluated their performance on open-ended, scenario-based clinical tasks, particularly in dentistry. This study aimed to evaluate and compare the clinical reasoning capabilities of ChatGPT-3.5 and GPT-4 in formulating treatment plans across seven dental specialties using realistic, open-ended clinical scenarios.

METHODS

A cross-sectional analytical study, reported in accordance with the STROBE guidelines, was conducted using 70 dental cases spanning endodontics, oral and maxillofacial surgery, oral medicine, orthodontics, paediatric dentistry, periodontology, and radiology. Each case was submitted to both ChatGPT-3.5 and GPT-4 (paid version, November 2024). Responses were evaluated by specialty-specific expert panels using a three-level rubric (poor, average, good). Statistical analyses included chi-square tests and Fisher-Freeman-Halton exact tests (α = 0.05).

RESULTS

GPT-4 significantly outperformed GPT-3.5 in overall response quality (67.1% vs. 44.3% rated as 'good'; p = 0.016). Although no significant differences were observed across most specialties, GPT-4 showed a statistically superior performance in oral and maxillofacial surgery. Its advantage was more pronounced in complex cases, aligning with the model's enhanced contextual reasoning.

CONCLUSION

GPT-4 demonstrated superior accuracy and consistency compared to GPT-3.5, particularly in clinically complex and integrative tasks. These findings support the potential of advanced LLMs as adjunct tools in dental education and decision-making, though specialty-specific applications and expert oversight remain essential.

摘要

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验