Gernandt Steven, Aymon Romain, Scolozzi Paolo
Division of Oral and Maxillofacial Surgery, Department of Surgery, University of Geneva & University Hospitals of Geneva, Geneva, Switzerland.
Division of Oral and Maxillofacial Surgery, Department of Surgery, Faculty of Medicine, University of Geneva & University Hospitals of Geneva, Geneva, Switzerland.
JPRAS Open. 2024 Sep 30;42:275-283. doi: 10.1016/j.jpra.2024.09.014. eCollection 2024 Dec.
Orbital fractures are common, but their management remains controversial. The aim of the present study was to assess the accuracy of an advanced artificial intelligence (AI) model, ChatGPT-4, in surgical decision-making, with a focus on orbital fracture diagnosis and management. A retrospective observational analysis was conducted by involving a sample of 30 orbital fracture cases diagnosed and managed at the Geneva University Hospital, Switzerland. The process involved creating patient vignettes from anonymised medical records and presenting them to ChatGPT-4 in three stages: initial diagnosis, refinement with radiological reports and surgical intervention decisions. The performance of ChatGPT-4 in providing the appropriate surgical strategy was evaluated through measures of sensitivity, specificity, positive predictive value and negative predictive value, with the actual management used as the benchmark for accuracy. The AI model could correctly diagnose the fracture in 100 % of the cases. It demonstrated a specificity of 100 % and sensitivity of 57 % for treatment recommendation, indicating its effectiveness in recognising patients who truly required an intervention; however, it demonstrated a moderate performance in correctly identifying cases that were better suited for conservative treatment. Cohen's Kappa statistic for interrater reliability of the choice of treatment was 0.44, indicating a weak level of agreement between ChatGPT and the physician's choice of treatment. The study demonstrates that AI tools such as ChatGPT-4 can offer a high degree of accuracy in diagnosing orbital fractures and recognising patients requiring surgical intervention; however, it performed less satisfactorily in correctly identifying patients who were better suited for non-surgical treatment.
眼眶骨折很常见,但其治疗仍存在争议。本研究的目的是评估先进的人工智能(AI)模型ChatGPT-4在手术决策中的准确性,重点是眼眶骨折的诊断和治疗。通过纳入瑞士日内瓦大学医院诊断和治疗的30例眼眶骨折病例样本进行回顾性观察分析。该过程包括从匿名医疗记录中创建患者病例,并分三个阶段将其呈现给ChatGPT-4:初步诊断、根据放射学报告进行细化以及手术干预决策。以实际治疗作为准确性的基准,通过灵敏度、特异度、阳性预测值和阴性预测值等指标评估ChatGPT-4在提供适当手术策略方面的表现。该AI模型在100%的病例中能够正确诊断骨折。在治疗建议方面,其特异度为100%,灵敏度为57%,表明它在识别真正需要干预的患者方面是有效的;然而,在正确识别更适合保守治疗的病例方面,其表现中等。治疗选择的评分者间信度的Cohen's Kappa统计量为0.44,表明ChatGPT与医生的治疗选择之间的一致性水平较弱。该研究表明,ChatGPT-4等AI工具在诊断眼眶骨折和识别需要手术干预的患者方面可以提供高度准确性;然而,在正确识别更适合非手术治疗的患者方面,其表现不太令人满意。