ChatGPT与颌面患者管理中的实习生表现。

ChatGPT and trainee performances in the management of maxillofacial patients.

作者信息

Peters Mélissa, Le Clercq Maxime, Yanni Antoine, Vanden Eynden Xavier, Martin Lalmand, Vanden Haute Noémie, Tancredi Szonja, De Passe Céline, Boutremans Edward, Lechien Jerome, Dequanter Didier

机构信息

Department of Stomatology, Oral & Maxillofacial Surgery, CHU Saint Pierre, Brussels, Belgium.

出版信息

J Stomatol Oral Maxillofac Surg. 2025 Jun;126(3):102090. doi: 10.1016/j.jormas.2024.102090. Epub 2024 Sep 25.

DOI:10.1016/j.jormas.2024.102090

PMID:39332706

Abstract

INTRODUCTION

ChatGPT is an artificial intelligence based large language model with the ability to generate human-like response to text input, its performance has already been the subject of several studies in different fields. The aim of this study was to evaluate the performance of ChatGPT in the management of maxillofacial clinical cases.

MATERIALS AND METHODS

A total of 38 clinical cases consulting at the Stomatology-Maxillofacial Surgery Department were prospectively recruited and presented to ChatGPT, which was interrogated for diagnosis, differential diagnosis, management and treatment. The performance of trainees and ChatGPT was compared by three blinded board-certified maxillofacial surgeons using the AIPI score.

RESULTS

The average total AIPI score assigned to the practitioners was 18.71 and 16.39 to ChatGPT, significantly lower (p < 0.001). According to the experts, ChatGPT was significantly less effective for diagnosis and treatment (p < 0.001). Following two of the three experts, ChatGPT was significantly less effective in considering patient data (p = 0.001) and suggesting additional examinations (p < 0.0001). The primary diagnosis proposed by ChatGPT was judged by the experts as not plausible and /or incomplete in 2.63 % to 18 % of the cases, the additional examinations were associated with inadequate examinations in 2.63 %, to 21.05 % of the cases and proposed an association of pertinent, but incomplete therapeutic findings in 18.42 % to 47.37 % of the cases, while the therapeutic findings were considered pertinent, necessary and inadequate in 18.42 % of cases.

CONCLUSIONS

ChatGPT appears less efficient in diagnosis, the selection of the most adequate additional examination and the proposition of pertinent and necessary therapeutic approaches.

摘要

引言

ChatGPT是一种基于人工智能的大型语言模型，能够对文本输入生成类似人类的回应，其性能已成为不同领域多项研究的主题。本研究的目的是评估ChatGPT在颌面临床病例管理中的性能。

材料与方法

前瞻性招募了共38例在口腔颌面外科就诊的临床病例，并将其呈现给ChatGPT，询问其诊断、鉴别诊断、管理和治疗方法。由三位获得董事会认证的盲法颌面外科医生使用AIPI评分比较实习生和ChatGPT的表现。

结果

分配给从业者的平均AIPI总分是18.71，给ChatGPT的是16.39，显著更低（p < 0.001）。根据专家的意见，ChatGPT在诊断和治疗方面的效果显著较差（p < 0.001）。在三位专家中的两位看来，ChatGPT在考虑患者数据（p = 0.001）和建议额外检查方面（p < 0.0001）效果显著较差。ChatGPT提出的初步诊断在2.63%至18%的病例中被专家判定为不可信和/或不完整，额外检查在2.63%至21.05%的病例中与不充分的检查相关，在18.42%至47.37%的病例中提出了相关但不完整的治疗结果，而治疗结果在18.42%的病例中被认为是相关的、必要的但不充分的。