评估ChatGPT对脊柱侧弯问题回答的可靠性、实用性、质量和可读性。

Evaluation of the reliability, usefulness, quality and readability of ChatGPT's responses on Scoliosis.

作者信息

Çıracıoğlu Ayşe Merve, Dal Erdoğan Suheyla

机构信息

Eskisehir City Hospital, Eskisehir, Türkiye.

Sincan Training and Research Hospital, Ankara, Türkiye.

出版信息

Eur J Orthop Surg Traumatol. 2025 Mar 18;35(1):123. doi: 10.1007/s00590-025-04198-4.

DOI:10.1007/s00590-025-04198-4

PMID:40100428

Abstract

OBJECTIVE

This study evaluates the reliability, usefulness, quality, and readability of ChatGPT's responses to frequently asked questions about scoliosis.

METHODS

Sixteen frequently asked questions, identified through an analysis of Google Trends data and clinical feedback, were presented to ChatGPT for evaluation. Two independent experts assessed the responses using a 7-point Likert scale for reliability and usefulness. Additionally, the overall quality was also rated using the Global Quality Scale (GQS). To assess readability, various established metrics were employed, including the Flesch Reading Ease score (FRE), the Simple Measure of Gobbledygook (SMOG) Index, the Coleman-Liau Index (CLI), the Gunning Fog Index (GFI), the Flesch-Kinkaid Grade Level (FKGL), the FORCAST Grade Level, and the Automated Readability Index (ARI).

RESULTS

The mean reliability scores were 4.68 ± 0.73 (Median: 5, IQR 4-5), while the mean usefulness scores were 4.84 ± 0.84 (Median: 5, IQR 4-5). Additionally the mean GQS scores were 4.28 ± 0.58 (Median: 4, IQR 4-5). Inter-rater reliability analysis using the Intraclass correlation coefficient showed excellent agreement: 0.942 for reliability, 0.935 for usefulness, and 0.868 for GQS. While general informational questions received high scores, responses to treatment-specific and personalized inquiries required greater depth and comprehensiveness. Readability analysis indicated that ChatGPT's responses required at least a high school senior to college-level reading ability.

CONCLUSION

ChatGPT provides reliable, useful, and moderate quality information on scoliosis but has limitations in addressing treatment-specific and personalized inquiries. Caution is essential when using Artificial Intelligence (AI) in patient education and medical decision-making.

摘要

目的

本研究评估ChatGPT对有关脊柱侧弯常见问题的回答的可靠性、实用性、质量和可读性。

方法

通过对谷歌趋势数据和临床反馈的分析确定了16个常见问题，并将其呈现给ChatGPT进行评估。两名独立专家使用7点李克特量表评估回答的可靠性和实用性。此外，还使用全球质量量表（GQS）对整体质量进行评分。为了评估可读性，采用了各种既定指标，包括弗莱什易读性分数（FRE）、难词简易衡量指标（SMOG）指数、科尔曼-廖指数（CLI）、冈宁雾度指数（GFI）、弗莱什-金凯德年级水平（FKGL）、预测年级水平和自动可读性指数（ARI）。

结果

平均可靠性得分为4.68±0.73（中位数：5，四分位距4-5），而平均实用性得分为4.84±0.84（中位数：5，四分位距4-5）。此外，平均GQS得分为4.28±0.58（中位数：4，四分位距4-5）。使用组内相关系数进行的评分者间可靠性分析显示出极佳的一致性：可靠性为0.942，实用性为0.935，GQS为0.868。虽然一般信息性问题得分较高，但对特定治疗和个性化询问的回答需要更高的深度和全面性。可读性分析表明，ChatGPT的回答至少需要高中高年级到大学水平的阅读能力。

结论

ChatGPT提供了关于脊柱侧弯的可靠、有用且质量适中的信息，但在回答特定治疗和个性化询问方面存在局限性。在患者教育和医疗决策中使用人工智能（AI）时必须谨慎。

相似文献

Evaluation of the reliability, usefulness, quality and readability of ChatGPT's responses on Scoliosis.评估ChatGPT对脊柱侧弯问题回答的可靠性、实用性、质量和可读性。

Eur J Orthop Surg Traumatol. 2025 Mar 18;35(1):123. doi: 10.1007/s00590-025-04198-4.

Assessing the Quality and Reliability of ChatGPT's Responses to Radiotherapy-Related Patient Queries: Comparative Study With GPT-3.5 and GPT-4.评估ChatGPT对放疗相关患者问题回答的质量和可靠性：与GPT-3.5和GPT-4的比较研究

JMIR Cancer. 2025 Apr 16;11:e63677. doi: 10.2196/63677.

ChatGPT-4o's performance on pediatric Vesicoureteral reflux.ChatGPT-4o在小儿膀胱输尿管反流方面的表现。

J Pediatr Urol. 2025 Apr;21(2):504-509. doi: 10.1016/j.jpurol.2024.12.002. Epub 2024 Dec 7.

Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy.评估 ChatGPT-4 在妊娠期间甲状腺功能减退症相关问题的回复的可靠性和可读性。

Sci Rep. 2024 Jan 2;14(1):243. doi: 10.1038/s41598-023-50884-w.

Assessing the quality and readability of patient education materials on chemotherapy cardiotoxicity from artificial intelligence chatbots: An observational cross-sectional study.评估人工智能聊天机器人提供的关于化疗心脏毒性的患者教育材料的质量和可读性：一项观察性横断面研究。

Medicine (Baltimore). 2025 Apr 11;104(15):e42135. doi: 10.1097/MD.0000000000042135.

Assessing the Readability of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.评估人工智能聊天机器人提供的心脏导管插入术患者教育材料的可读性：一项观察性横断面研究。

Cureus. 2024 Jul 4;16(7):e63865. doi: 10.7759/cureus.63865. eCollection 2024 Jul.

Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: Multimetric Assessment.评估 ChatGPT 在前列腺癌患者教育中的疗效：多指标评估。

J Med Internet Res. 2024 Aug 14;26:e55939. doi: 10.2196/55939.

Generative artificial intelligence chatbots may provide appropriate informational responses to common vascular surgery questions by patients.生成式人工智能聊天机器人可能会为患者关于常见血管外科问题提供恰当的信息性回复。

Vascular. 2025 Feb;33(1):229-237. doi: 10.1177/17085381241240550. Epub 2024 Mar 18.

Assessing the Responses of Large Language Models (ChatGPT-4, Claude 3, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Retinopathy of Prematurity: A Study on Readability and Appropriateness.评估大型语言模型（ChatGPT-4、Claude 3、Gemini和Microsoft Copilot）对早产儿视网膜病变常见问题的回答：一项关于可读性和适宜性的研究

J Pediatr Ophthalmol Strabismus. 2025 Mar-Apr;62(2):84-95. doi: 10.3928/01913913-20240911-05. Epub 2024 Oct 28.

American academy of Orthopedic Surgeons' OrthoInfo provides more readable information regarding meniscus injury than ChatGPT-4 while information accuracy is comparable.美国矫形外科医师学会的OrthoInfo在半月板损伤方面提供了比ChatGPT-4更具可读性的信息，而信息准确性相当。

J ISAKOS. 2025 Apr;11:100843. doi: 10.1016/j.jisako.2025.100843. Epub 2025 Feb 21.

引用本文的文献

Assessing the Quality, Usefulness, and Reliability of Large Language Models (ChatGPT, DeepSeek, and Gemini) in Answering General Questions Regarding Dyslexia and Dyscalculia.评估大型语言模型（ChatGPT、豆包和Gemini）在回答有关阅读障碍和计算障碍的一般问题时的质量、实用性和可靠性。（注：原文中的DeepSeek在国内一般被称为豆包）

Psychiatr Q. 2025 Jun 12. doi: 10.1007/s11126-025-10170-6.

本文引用的文献

Evaluation of Informative Content on Cerebral Palsy in the Era of Artificial Intelligence: The Value of ChatGPT.人工智能时代脑性瘫痪相关信息内容评估：ChatGPT 的价值。

Phys Occup Ther Pediatr. 2024;44(5):605-614. doi: 10.1080/01942638.2024.2316178. Epub 2024 Feb 15.

Sci Rep. 2024 Jan 2;14(1):243. doi: 10.1038/s41598-023-50884-w.

Artificial Intelligence in Scoliosis: Current Applications and Future Directions.人工智能在脊柱侧凸中的应用：当前应用与未来方向。

J Clin Med. 2023 Nov 29;12(23):7382. doi: 10.3390/jcm12237382.

Creation and Adoption of Large Language Models in Medicine.医学领域中大型语言模型的创建与采用。

JAMA. 2023 Sep 5;330(9):866-869. doi: 10.1001/jama.2023.14217.

Putting ChatGPT's Medical Advice to the (Turing) Test: Survey Study.对ChatGPT的医学建议进行（图灵）测试：调查研究。

JMIR Med Educ. 2023 Jul 10;9:e46939. doi: 10.2196/46939.

Utility of ChatGPT in Clinical Practice.ChatGPT 在临床实践中的应用。

J Med Internet Res. 2023 Jun 28;25:e48568. doi: 10.2196/48568.

"Dr ChatGPT": Is it a reliable and useful source for common rheumatic diseases?“ChatGPT 医生”：它是常见风湿病的可靠且有用的信息来源吗？

Int J Rheum Dis. 2023 Jul;26(7):1343-1349. doi: 10.1111/1756-185X.14749. Epub 2023 May 23.

How Chatbots and Large Language Model Artificial Intelligence Systems Will Reshape Modern Medicine: Fountain of Creativity or Pandora's Box?聊天机器人和大语言模型人工智能系统将如何重塑现代医学：创造力之源还是潘多拉魔盒？

JAMA Intern Med. 2023 Jun 1;183(6):596-597. doi: 10.1001/jamainternmed.2023.1835.

Screening for Adolescent Idiopathic Scoliosis: Evidence Report and Systematic Review for the US Preventive Services Task Force.青少年特发性脊柱侧凸筛查：美国预防服务工作组的证据报告和系统评价。

JAMA. 2018 Jan 9;319(2):173-187. doi: 10.1001/jama.2017.11669.

Preoperative curves of greater magnitude (>70°) in adolescent idiopathic scoliosis are associated with increased surgical complexity, higher cost of surgical treatment and a delayed return to function.青少年特发性脊柱侧凸患者术前侧弯角度较大（>70°）与手术复杂性增加、手术治疗费用较高以及功能恢复延迟相关。

Ir J Med Sci. 2016 May;185(2):463-71. doi: 10.1007/s11845-015-1391-5. Epub 2016 Jan 7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估ChatGPT对脊柱侧弯问题回答的可靠性、实用性、质量和可读性。

Evaluation of the reliability, usefulness, quality and readability of ChatGPT's responses on Scoliosis.

作者信息

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献