Peters Mélissa, Le Clercq Maxime, Yanni Antoine, Vanden Eynden Xavier, Martin Lalmand, Vanden Haute Noémie, Tancredi Szonja, De Passe Céline, Boutremans Edward, Lechien Jerome, Dequanter Didier
Department of Stomatology, Oral & Maxillofacial Surgery, CHU Saint Pierre, Brussels, Belgium.
Department of Stomatology, Oral & Maxillofacial Surgery, CHU Saint Pierre, Brussels, Belgium.
J Stomatol Oral Maxillofac Surg. 2025 Jun;126(3):102090. doi: 10.1016/j.jormas.2024.102090. Epub 2024 Sep 25.
ChatGPT is an artificial intelligence based large language model with the ability to generate human-like response to text input, its performance has already been the subject of several studies in different fields. The aim of this study was to evaluate the performance of ChatGPT in the management of maxillofacial clinical cases.
A total of 38 clinical cases consulting at the Stomatology-Maxillofacial Surgery Department were prospectively recruited and presented to ChatGPT, which was interrogated for diagnosis, differential diagnosis, management and treatment. The performance of trainees and ChatGPT was compared by three blinded board-certified maxillofacial surgeons using the AIPI score.
The average total AIPI score assigned to the practitioners was 18.71 and 16.39 to ChatGPT, significantly lower (p < 0.001). According to the experts, ChatGPT was significantly less effective for diagnosis and treatment (p < 0.001). Following two of the three experts, ChatGPT was significantly less effective in considering patient data (p = 0.001) and suggesting additional examinations (p < 0.0001). The primary diagnosis proposed by ChatGPT was judged by the experts as not plausible and /or incomplete in 2.63 % to 18 % of the cases, the additional examinations were associated with inadequate examinations in 2.63 %, to 21.05 % of the cases and proposed an association of pertinent, but incomplete therapeutic findings in 18.42 % to 47.37 % of the cases, while the therapeutic findings were considered pertinent, necessary and inadequate in 18.42 % of cases.
ChatGPT appears less efficient in diagnosis, the selection of the most adequate additional examination and the proposition of pertinent and necessary therapeutic approaches.
ChatGPT是一种基于人工智能的大型语言模型,能够对文本输入生成类似人类的回应,其性能已成为不同领域多项研究的主题。本研究的目的是评估ChatGPT在颌面临床病例管理中的性能。
前瞻性招募了共38例在口腔颌面外科就诊的临床病例,并将其呈现给ChatGPT,询问其诊断、鉴别诊断、管理和治疗方法。由三位获得董事会认证的盲法颌面外科医生使用AIPI评分比较实习生和ChatGPT的表现。
分配给从业者的平均AIPI总分是18.71,给ChatGPT的是16.39,显著更低(p < 0.001)。根据专家的意见,ChatGPT在诊断和治疗方面的效果显著较差(p < 0.001)。在三位专家中的两位看来,ChatGPT在考虑患者数据(p = 0.001)和建议额外检查方面(p < 0.0001)效果显著较差。ChatGPT提出的初步诊断在2.63%至18%的病例中被专家判定为不可信和/或不完整,额外检查在2.63%至21.05%的病例中与不充分的检查相关,在18.42%至47.37%的病例中提出了相关但不完整的治疗结果,而治疗结果在18.42%的病例中被认为是相关的、必要的但不充分的。
ChatGPT在诊断、选择最适当的额外检查以及提出相关且必要的治疗方法方面似乎效率较低。