Sherif Islam A, Nser Sundus Y, Bobo Ahmed, Afridi Asif, Hamed Ahmed, Dunbar Mark, Boutefnouchet Tarek
Trauma and Orthopaedics, Warwick Hospital, Birmingham, GBR.
General Medicine, Hashemite University, Zarqa, JOR.
Cureus. 2024 Dec 10;16(12):e75440. doi: 10.7759/cureus.75440. eCollection 2024 Dec.
Introduction Artificial intelligence (AI)-powered tools are increasingly integrated into healthcare. The purpose of the present study was to compare fracture management plans generated by clinicians to those obtained from ChatGPT (OpenAI, San Francisco, CA) and Google Gemini (Google, Inc., Mountain View, CA). Methodology A retrospective comparative analysis was conducted. The study included 70 cases of isolated injuries treated at the authors' institution fracture clinic. Complex, open fractures and non-specific diagnoses were excluded. All relevant clinical details were introduced into ChatGPT and Google Gemini. The AI-generated management plans were compared with actual documented plans obtained from the clinical records. The study focused on treatment recommendations and follow-up strategies. Results In terms of agreement with actual treatment plans, Google Gemini matched in only 13 cases (19%), with disagreements in the remainder of cases due to overgeneralisation, inadequate treatment, and ambiguity. In contrast, ChatGPT matched actual plans in 24 cases (34%), with overgeneralisation being the principal cause for disagreement. The differences between AI-powered tools and actual clinician-led plans were statistically significant (p < 0.001). Conclusion Both AI-powered tools demonstrated significant disagreement with actual clinical management plans. While ChatGPT showed closer alignment to human expertise, particularly in treatment recommendations, both AI engines still lacked the clinical precision required for accurate fracture management. These findings highlight the current limitations of ordinary AI-powered tools and negate their ability to replace a clinician-led fracture clinic appointment.
引言 人工智能(AI)驱动的工具越来越多地融入医疗保健领域。本研究的目的是比较临床医生制定的骨折管理计划与从ChatGPT(OpenAI,旧金山,加利福尼亚州)和谷歌Gemini(谷歌公司,山景城,加利福尼亚州)获得的计划。
方法 进行了一项回顾性比较分析。该研究纳入了在作者所在机构骨折诊所治疗的70例孤立性损伤病例。排除复杂开放性骨折和非特异性诊断。将所有相关临床细节输入ChatGPT和谷歌Gemini。将人工智能生成的管理计划与从临床记录中获得的实际记录计划进行比较。该研究重点关注治疗建议和随访策略。
结果 在与实际治疗计划的一致性方面,谷歌Gemini仅在13例(19%)中匹配,其余病例存在分歧,原因是过度概括、治疗不足和含糊不清。相比之下,ChatGPT在24例(34%)中与实际计划匹配,过度概括是分歧的主要原因。人工智能驱动的工具与实际临床医生主导的计划之间的差异具有统计学意义(p < 0.001)。
结论 两种人工智能驱动的工具都与实际临床管理计划存在显著分歧。虽然ChatGPT与人类专业知识更接近,特别是在治疗建议方面,但两个人工智能引擎仍然缺乏准确骨折管理所需的临床精度。这些发现凸显了普通人工智能驱动工具目前的局限性,并否定了它们取代临床医生主导的骨折诊所预约的能力。