比较生成式人工智能技术在常见脊柱侧弯问题中的有效性。

Comparing the effectiveness of generative AI technology in commonly asked scoliosis questions.

作者信息

Suresh Adarsh, Siahaan Jacob, Marco Rex Aw, Klineberg Eric, Borden Timothy, Vanodia Rohini, Crawford Lindsay, Dodwad Shah-Nawaz, Younas Shiraz, Mundluru Surya

机构信息

Department of Orthopaedic Surgery, University of Texas at Houston Health Science Center, Houston, TX, USA.

出版信息

J Child Orthop. 2025 Jul 26:18632521251359098. doi: 10.1177/18632521251359098.

DOI:10.1177/18632521251359098

PMID:40735356

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12301223/

Abstract

PURPOSE

In recent years, generative artificial intelligence systems have transformed the landscape of patient's access to medical information and education. As increases in general and subspeciality physician shortages lead to longer lead times for patients to get access to physicians, we aim to understand how effectively different AI platforms can respond to questions asked by parents about both operative and nonoperative scoliosis.

METHODS

A survey comprised of 31 questions, among the most commonly asked, regarding scoliosis with responses from ChatGPT, Google Gemini, and Microsoft Copilot was administered to board-certified Orthopedic surgeons, fellowship trained in either pediatric or spine surgery. (four reviewers). They evaluated each output from Likert Scale of 1-5 with 5 meaning an excellent response was given and 1 meaning a poor response was given. Pairwise comparisons were used for analysis.

RESULTS

All three generative AI technologies performed well with an overall mean rating of 3.4 which is between good and very good on the Likert Scale provided. ChatGPT performed the best out of the three, with a mean rating of 4.0, Google Gemini was second best with a mean rating of 3.1, and Copilot was third best with a mean rating of 3.1. ChatGPT compared with Gemini and Copilot revealed statistically significant differences with a p-value <0.001, with no statistical difference between Gemini and Copilot.

CONCLUSION

In response to common scoliosis questions asked by parents, ChatGPT, Microsoft Copilot, and Google Gemini, were scored highly by our Spine team and has important indications for use in the future.

摘要

目的

近年来，生成式人工智能系统改变了患者获取医疗信息和教育的格局。由于普通内科和专科医生短缺的情况日益严重，患者看诊的等待时间变长，我们旨在了解不同的人工智能平台能够多有效地回答家长提出的关于手术和非手术治疗脊柱侧弯的问题。

方法

向获得骨科专科认证、接受过儿科或脊柱外科专科培训的骨科医生（四位评审员）发放了一份包含31个关于脊柱侧弯最常见问题的调查问卷，这些问题来自ChatGPT、谷歌Gemini和微软Copilot的回答。他们根据1-5的李克特量表对每个回答进行评估，5表示回答优秀，1表示回答不佳。采用成对比较进行分析。

结果

所有三种生成式人工智能技术表现良好，总体平均评分为3.4，在所提供的李克特量表上处于良好到非常好之间。ChatGPT在三者中表现最佳，平均评分为4.0，谷歌Gemini次之，平均评分为3.1，Copilot排名第三，平均评分为3.1。ChatGPT与Gemini和Copilot相比，差异具有统计学意义，p值<0.001，Gemini和Copilot之间无统计学差异。