Öztürk Ahmet, Günay Serkan, Ateş Serdal, Yiğit Yavuz Yigit Yavuz
Department of Emergency Medicine, Hitit University Çorum Erol Olçok Education and Research Hospital, Çorum, Turkey.
Department of Emergency Medicine, Health Sciences University, Ankara Training and Research Hospital, Ankara, Turkey.
J Emerg Med. 2025 Jun;73:71-79. doi: 10.1016/j.jemermed.2024.12.010. Epub 2025 Jan 4.
The latest artificial intelligence (AI) model, GPT-4o, introduced by OpenAI, can process visual data, presenting a novel opportunity for radiographic evaluation in trauma patients.
This study aimed to assess the efficacy of GPT-4o in interpreting radiographs for traumatic bone pathologies and to compare its performance with that of emergency medicine and orthopedic specialists.
The study involved 10 emergency medicine specialists, 10 orthopedic specialists, and the GPT-4o AI model, evaluating 25 cases of traumatic bone pathologies of the upper and lower extremities selected from the Radiopaedia website. Participants were asked to identify fractures or dislocations in the radiographs within 45 minutes. GPT-4o was instructed to perform the same task in 10 different chat sessions.
Emergency medicine specialists and orthopedic specialists demonstrated an average accuracy of 82.8% and 87.2%, respectively, in radiograph interpretation. In contrast, GPT-4o achieved an accuracy of only 11.2%. Statistical analysis revealed significant differences among the three groups (p < 0.001), with GPT-4o performing significantly worse than both groups of specialists.
GPT-4o's ability to interpret radiographs of traumatic bone pathologies is currently limited and significantly inferior to that of trained specialists. These findings underscore the ongoing need for human expertise in trauma diagnosis and highlight the challenges of applying AI to complex medical imaging tasks.
OpenAI推出的最新人工智能(AI)模型GPT-4o能够处理视觉数据,为创伤患者的放射学评估带来了新机遇。
本研究旨在评估GPT-4o解读创伤性骨病变X光片的效能,并将其表现与急诊医学专家和骨科专家的表现进行比较。
该研究纳入了10名急诊医学专家、10名骨科专家以及GPT-4o人工智能模型,对从Radiopaedia网站选取的25例上下肢创伤性骨病变病例的X光片进行评估。要求参与者在45分钟内识别X光片中的骨折或脱位情况。GPT-4o被指示在10个不同的聊天会话中执行相同任务。
急诊医学专家和骨科专家在X光片解读中的平均准确率分别为82.8%和87.2%。相比之下,GPT-4o的准确率仅为11.2%。统计分析显示三组之间存在显著差异(p < 0.001),GPT-4o的表现明显不如两组专家。
GPT-4o解读创伤性骨病变X光片的能力目前有限,且明显逊于训练有素的专家。这些发现强调了创伤诊断中人类专业知识的持续必要性,并凸显了将人工智能应用于复杂医学成像任务的挑战。