Cuevas-Nunez Maria, Silberberg Valentina Ignacia Alvarez, Arregui Maria, Jham Bruno C, Ballester-Victoria Rosa, Koptseva Inessa, de Tejada María José Biosca Gómez, Posada-Caez Rodolfo, Manich Victor Gil, Bara-Casaus Javier, Fernández-Figueras Maria-Teresa
Faculty of Dentistry, Universitat Internacional de Catalunya, Barcelona, Spain; Hospital Universitari General de Catalunya, Barcelona, Spain.
Faculty of Dentistry, Universitat Internacional de Catalunya, Barcelona, Spain; Universidad de los Andes, Santiago de Chile, Chile.
Oral Surg Oral Med Oral Pathol Oral Radiol. 2025 Apr;139(4):453-461. doi: 10.1016/j.oooo.2024.11.087. Epub 2024 Nov 28.
To evaluate the diagnostic performance of ChatGPT-4.0 in histopathological diagnoses of oral and maxillofacial lesions and compare its performance with pathologists.
A retrospective analysis of 102 histopathological descriptions was conducted. Data, including site, age and sex, were anonymized from the General University Hospital's Department of Pathology. ChatGPT-4.0 provided diagnoses, which were categorized as correct, similar, or different compared to pathologists' diagnoses. Descriptive statistics, Chi-squared tests, correlation, and regression analyses were used to assess accuracy and the influence of age and gender.
ChatGPT-4.0 correctly diagnosed 61 out of 102 cases, yielding an accuracy of 59.8%. The distribution of diagnostic scores did not significantly deviate from expectations (Chi-squared Statistic: 0.0, P = 1.0). A moderate negative correlation between age and diagnostic scores (r = -0.33) was observed, with age significantly predicting scores (P = .001). No significant difference was found between genders (P = .26). ChatGPT-4.0 performed worst with granuloma and inflammation cases (100% incorrect) and best with mucocele cases (93.3% correct).
ChatGPT-4.0 shows moderate accuracy in histopathological diagnosis of oral and maxillofacial lesions, with performance varying by lesion type. Improvements are needed to enhance its clinical reliability.
评估ChatGPT-4.0在口腔颌面部病变组织病理学诊断中的诊断性能,并将其与病理学家的诊断性能进行比较。
对102份组织病理学描述进行回顾性分析。数据包括部位、年龄和性别,均来自综合大学医院病理科,并进行了匿名化处理。ChatGPT-4.0给出诊断结果,与病理学家的诊断结果相比,分为正确、相似或不同三类。采用描述性统计、卡方检验、相关性分析和回归分析来评估准确性以及年龄和性别的影响。
ChatGPT-4.0在102例病例中正确诊断出61例,准确率为59.8%。诊断分数的分布与预期无显著偏差(卡方统计量:0.0,P = 1.0)。观察到年龄与诊断分数之间存在中度负相关(r = -0.33),年龄对分数有显著预测作用(P = .001)。性别之间未发现显著差异(P = .26)。ChatGPT-4.0在肉芽肿和炎症病例中的表现最差(100%错误),在黏液囊肿病例中的表现最佳(93.3%正确)。
ChatGPT-4.0在口腔颌面部病变的组织病理学诊断中显示出中等准确性,其性能因病变类型而异。需要改进以提高其临床可靠性。