Guler Ridvan, Yalcin Emine
Department of Oral and Maxillofacial Surgery, Dicle University Faculty of Dentistry, Diyarbakir, Turkey.
Med Sci Monit. 2025 Jul 9;31:e949076. doi: 10.12659/MSM.949076.
BACKGROUND Artificial intelligence (AI) has shown significant potential in transforming healthcare by enabling accurate, data-driven decision-making. This study compared the performance of the AI chatbots ChatGPT, Grok, Blackbox, and Claude AI in preliminary diagnosis of maxillofacial pathologies. MATERIAL AND METHODS This study included 23 patients (9 cysts, 14 neoplasms) who underwent operations at Dicle University Faculty of Dentistry between 2017 and 2024 and had their diagnoses histopathologically confirmed. For each case, 4 differential diagnosis options were prepared in question format and directed to the AI platforms. The accuracy of the answers given by the chatbots was analyzed by comparing them with the definitive histopathological diagnoses of the cases. Statistical analysis used the chi-square ad Fisher-Freeman-Halton tests to compare performance among the chatbots. Statistical significance was set at p<0.05. RESULTS ChatGPT answered 15 out of 23 questions correctly, achieving a success rate of 65.2%. Grok and Blackbox AI each achieved a success rate of 52.17%, while Claude AI achieved the lowest success rate, at 30.43%. When cases were categorized into cysts and neoplasms, Blackbox AI showed the highest accuracy for cyst cases (66.6%), while ChatGPT had the highest accuracy for neoplasm cases (71.4%). No statistically significant difference was observed in the distribution of correct and incorrect answers among the chatbots (p=0.125). No statistically significant difference was observed in the distribution of cysts and neoplasms answers among the chatbots (p=0.654). CONCLUSIONS Although all 4 AI chatbots achieved certain levels of accuracy, ChatGPT showed superior performance compared to other chatbots. The development of these chatbots could be beneficial for diagnostic accuracy and treatment recommendations in dentistry.
背景 人工智能(AI)通过实现准确的、数据驱动的决策,在变革医疗保健方面显示出巨大潜力。本研究比较了人工智能聊天机器人ChatGPT、Grok、Blackbox和Claude AI在颌面部病变初步诊断中的表现。
材料与方法 本研究纳入了23例患者(9例囊肿,14例肿瘤),这些患者于2017年至2024年在迪克莱大学牙科学院接受手术,其诊断经组织病理学证实。针对每个病例,以问题形式准备了4种鉴别诊断选项,并发送至人工智能平台。通过将聊天机器人给出的答案与病例的最终组织病理学诊断进行比较,分析答案的准确性。统计分析使用卡方检验和费舍尔 - 弗里曼 - 哈尔顿检验来比较聊天机器人之间的性能。设定统计学显著性为p<0.05。
结果 ChatGPT在23个问题中正确回答了15个,成功率为65.2%。Grok和Blackbox AI的成功率均为52.17%,而Claude AI的成功率最低,为30.43%。当病例分为囊肿和肿瘤时,Blackbox AI在囊肿病例中显示出最高的准确率(66.6%),而ChatGPT在肿瘤病例中准确率最高(71.4%)。在聊天机器人之间,正确和错误答案的分布没有观察到统计学显著差异(p = 0.125)。在聊天机器人之间,囊肿和肿瘤答案的分布也没有观察到统计学显著差异(p = 0.654)。
结论 尽管所有4个人工智能聊天机器人都达到了一定的准确率水平,但ChatGPT与其他聊天机器人相比表现更优。这些聊天机器人的开发可能有助于提高牙科诊断的准确性和治疗建议。