评估人工智能在眼眶骨折诊断和治疗中的准确性：这会是外科手术决策的未来吗？

Assessing the accuracy of artificial intelligence in the diagnosis and management of orbital fractures: Is this the future of surgical decision-making?

作者信息

Gernandt Steven, Aymon Romain, Scolozzi Paolo

机构信息

Division of Oral and Maxillofacial Surgery, Department of Surgery, University of Geneva & University Hospitals of Geneva, Geneva, Switzerland.

Division of Oral and Maxillofacial Surgery, Department of Surgery, Faculty of Medicine, University of Geneva & University Hospitals of Geneva, Geneva, Switzerland.

出版信息

JPRAS Open. 2024 Sep 30;42:275-283. doi: 10.1016/j.jpra.2024.09.014. eCollection 2024 Dec.

DOI:10.1016/j.jpra.2024.09.014

PMID:39498287

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11532732/

Abstract

Orbital fractures are common, but their management remains controversial. The aim of the present study was to assess the accuracy of an advanced artificial intelligence (AI) model, ChatGPT-4, in surgical decision-making, with a focus on orbital fracture diagnosis and management. A retrospective observational analysis was conducted by involving a sample of 30 orbital fracture cases diagnosed and managed at the Geneva University Hospital, Switzerland. The process involved creating patient vignettes from anonymised medical records and presenting them to ChatGPT-4 in three stages: initial diagnosis, refinement with radiological reports and surgical intervention decisions. The performance of ChatGPT-4 in providing the appropriate surgical strategy was evaluated through measures of sensitivity, specificity, positive predictive value and negative predictive value, with the actual management used as the benchmark for accuracy. The AI model could correctly diagnose the fracture in 100 % of the cases. It demonstrated a specificity of 100 % and sensitivity of 57 % for treatment recommendation, indicating its effectiveness in recognising patients who truly required an intervention; however, it demonstrated a moderate performance in correctly identifying cases that were better suited for conservative treatment. Cohen's Kappa statistic for interrater reliability of the choice of treatment was 0.44, indicating a weak level of agreement between ChatGPT and the physician's choice of treatment. The study demonstrates that AI tools such as ChatGPT-4 can offer a high degree of accuracy in diagnosing orbital fractures and recognising patients requiring surgical intervention; however, it performed less satisfactorily in correctly identifying patients who were better suited for non-surgical treatment.

摘要

眼眶骨折很常见，但其治疗仍存在争议。本研究的目的是评估先进的人工智能（AI）模型ChatGPT-4在手术决策中的准确性，重点是眼眶骨折的诊断和治疗。通过纳入瑞士日内瓦大学医院诊断和治疗的30例眼眶骨折病例样本进行回顾性观察分析。该过程包括从匿名医疗记录中创建患者病例，并分三个阶段将其呈现给ChatGPT-4：初步诊断、根据放射学报告进行细化以及手术干预决策。以实际治疗作为准确性的基准，通过灵敏度、特异度、阳性预测值和阴性预测值等指标评估ChatGPT-4在提供适当手术策略方面的表现。该AI模型在100%的病例中能够正确诊断骨折。在治疗建议方面，其特异度为100%，灵敏度为57%，表明它在识别真正需要干预的患者方面是有效的；然而，在正确识别更适合保守治疗的病例方面，其表现中等。治疗选择的评分者间信度的Cohen's Kappa统计量为0.44，表明ChatGPT与医生的治疗选择之间的一致性水平较弱。该研究表明，ChatGPT-4等AI工具在诊断眼眶骨折和识别需要手术干预的患者方面可以提供高度准确性；然而，在正确识别更适合非手术治疗的患者方面，其表现不太令人满意。

相似文献

Assessing the accuracy of artificial intelligence in the diagnosis and management of orbital fractures: Is this the future of surgical decision-making?评估人工智能在眼眶骨折诊断和治疗中的准确性：这会是外科手术决策的未来吗？

JPRAS Open. 2024 Sep 30;42:275-283. doi: 10.1016/j.jpra.2024.09.014. eCollection 2024 Dec.

"Dr. AI Will See You Now": How Do ChatGPT-4 Treatment Recommendations Align With Orthopaedic Clinical Practice Guidelines?“AI 医生为您服务”：ChatGPT-4 的治疗建议与骨科临床实践指南如何契合？

Clin Orthop Relat Res. 2024 Dec 1;482(12):2098-2106. doi: 10.1097/CORR.0000000000003234. Epub 2024 Sep 6.

Arthrosis diagnosis and treatment recommendations in clinical practice: an exploratory investigation with the generative AI model GPT-4.在临床实践中进行关节病诊断和治疗的建议：使用生成式人工智能模型 GPT-4 进行的探索性研究。

J Orthop Traumatol. 2023 Nov 28;24(1):61. doi: 10.1186/s10195-023-00740-4.

The ability of artificial intelligence tools to formulate orthopaedic clinical decisions in comparison to human clinicians: An analysis of ChatGPT 3.5, ChatGPT 4, and Bard.与人类临床医生相比，人工智能工具制定骨科临床决策的能力：对ChatGPT 3.5、ChatGPT 4和Bard的分析。

J Orthop. 2023 Dec 1;50:1-7. doi: 10.1016/j.jor.2023.11.063. eCollection 2024 Apr.

Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study.评估 ChatGPT 在整个临床工作流程中的效用：开发和可用性研究。

J Med Internet Res. 2023 Aug 22;25:e48659. doi: 10.2196/48659.

Optimizing ChatGPT's Interpretation and Reporting of Delirium Assessment Outcomes: Exploratory Study.优化 ChatGPT 对谵妄评估结果的解释和报告：探索性研究。

JMIR Form Res. 2024 Oct 1;8:e51383. doi: 10.2196/51383.

Artificial Intelligence in Scoliosis Classification: An Investigation of Language-Based Models.人工智能在脊柱侧弯分类中的应用：基于语言模型的研究

J Pers Med. 2023 Dec 9;13(12):1695. doi: 10.3390/jpm13121695.

Assessing the Accuracy of Generative Conversational Artificial Intelligence in Debunking Sleep Health Myths: Mixed Methods Comparative Study With Expert Analysis.评估生成式对话人工智能在破除睡眠健康误区方面的准确性：采用专家分析的混合方法比较研究

JMIR Form Res. 2024 Apr 16;8:e55762. doi: 10.2196/55762.

Performance of ChatGPT as an AI-assisted decision support tool in medicine: a proof-of-concept study for interpreting symptoms and management of common cardiac conditions (AMSTELHEART-2).ChatGPT 在医学中作为 AI 辅助决策支持工具的性能：解释常见心脏疾病症状和管理的概念验证研究 (AMSTELHEART-2)。

Acta Cardiol. 2024 May;79(3):358-366. doi: 10.1080/00015385.2024.2303528. Epub 2024 Feb 13.

Evaluation of ChatGPT-4's Performance in Therapeutic Decision-Making During Multidisciplinary Oncology Meetings for Head and Neck Squamous Cell Carcinoma.头颈部鳞状细胞癌多学科肿瘤学会议中ChatGPT-4在治疗决策中的性能评估

Cureus. 2024 Sep 6;16(9):e68808. doi: 10.7759/cureus.68808. eCollection 2024 Sep.

引用本文的文献

Epidemiological Overview and Traits into Disorders of the Orbital Walls in North-Eastern Romania.罗马尼亚东北部眼眶壁疾病的流行病学概述及特征

Medicina (Kaunas). 2025 May 22;61(6):953. doi: 10.3390/medicina61060953.

本文引用的文献

Large Language Models for Intraoperative Decision Support in Plastic Surgery: A Comparison between ChatGPT-4 and Gemini.大型语言模型在整形手术中的术中决策支持：ChatGPT-4 和 Gemini 的比较。

Medicina (Kaunas). 2024 Jun 8;60(6):957. doi: 10.3390/medicina60060957.

J Orthop Traumatol. 2023 Nov 28;24(1):61. doi: 10.1186/s10195-023-00740-4.

How ChatGPT works: a mini review.ChatGPT的工作原理：一篇简短综述。

Eur Arch Otorhinolaryngol. 2024 Mar;281(3):1565-1569. doi: 10.1007/s00405-023-08337-7. Epub 2023 Nov 22.

Chat GPT for the management of obstructive sleep apnea: do we have a polar star?Chat GPT 在阻塞性睡眠呼吸暂停管理中的应用：我们是否有了一颗指路明星？

Eur Arch Otorhinolaryngol. 2024 Apr;281(4):2087-2093. doi: 10.1007/s00405-023-08270-9. Epub 2023 Nov 19.

A pilot study on the efficacy of GPT-4 in providing orthopedic treatment recommendations from MRI reports.GPT-4 在提供 MRI 报告中的骨科治疗建议方面的功效的初步研究。

Sci Rep. 2023 Nov 17;13(1):20159. doi: 10.1038/s41598-023-47500-2.

Automatic orbital segmentation using deep learning-based 2D U-net and accuracy evaluation: A retrospective study.基于深度学习的二维 U-Net 自动眼眶分割及其准确性评估：一项回顾性研究。

J Craniomaxillofac Surg. 2023 Oct;51(10):609-613. doi: 10.1016/j.jcms.2023.09.003. Epub 2023 Sep 28.

Role of Artificial Intelligence in Global Surgery: A Review of Opportunities and Challenges.人工智能在全球外科手术中的作用：机遇与挑战综述

Cureus. 2023 Aug 9;15(8):e43192. doi: 10.7759/cureus.43192. eCollection 2023 Aug.

Evaluating the Accuracy and Reliability of Blowout Fracture Area Measurement Methods: A Review and the Potential Role of Artificial Intelligence.评估爆裂性骨折面积测量方法的准确性和可靠性：综述及人工智能的潜在作用。

J Craniofac Surg. 2023 Sep 1;34(6):1834-1836. doi: 10.1097/SCS.0000000000009486. Epub 2023 Jun 16.

Automatic Identification and Segmentation of Orbital Blowout Fractures Based on Artificial Intelligence.基于人工智能的眼眶爆裂性骨折的自动识别和分割。

Transl Vis Sci Technol. 2023 Apr 3;12(4):7. doi: 10.1167/tvst.12.4.7.

Exploring the Potential of Artificial Intelligence in Surgery: Insights from a Conversation with ChatGPT.探索人工智能在手术中的潜力：与ChatGPT对话的见解

Ann Surg Oncol. 2023 Jul;30(7):3875-3878. doi: 10.1245/s10434-023-13347-0. Epub 2023 Apr 5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验