Suppr超能文献

评估ChatGPT对脊柱侧弯问题回答的可靠性、实用性、质量和可读性。

Evaluation of the reliability, usefulness, quality and readability of ChatGPT's responses on Scoliosis.

作者信息

Çıracıoğlu Ayşe Merve, Dal Erdoğan Suheyla

机构信息

Eskisehir City Hospital, Eskisehir, Türkiye.

Sincan Training and Research Hospital, Ankara, Türkiye.

出版信息

Eur J Orthop Surg Traumatol. 2025 Mar 18;35(1):123. doi: 10.1007/s00590-025-04198-4.

Abstract

OBJECTIVE

This study evaluates the reliability, usefulness, quality, and readability of ChatGPT's responses to frequently asked questions about scoliosis.

METHODS

Sixteen frequently asked questions, identified through an analysis of Google Trends data and clinical feedback, were presented to ChatGPT for evaluation. Two independent experts assessed the responses using a 7-point Likert scale for reliability and usefulness. Additionally, the overall quality was also rated using the Global Quality Scale (GQS). To assess readability, various established metrics were employed, including the Flesch Reading Ease score (FRE), the Simple Measure of Gobbledygook (SMOG) Index, the Coleman-Liau Index (CLI), the Gunning Fog Index (GFI), the Flesch-Kinkaid Grade Level (FKGL), the FORCAST Grade Level, and the Automated Readability Index (ARI).

RESULTS

The mean reliability scores were 4.68 ± 0.73 (Median: 5, IQR 4-5), while the mean usefulness scores were 4.84 ± 0.84 (Median: 5, IQR 4-5). Additionally the mean GQS scores were 4.28 ± 0.58 (Median: 4, IQR 4-5). Inter-rater reliability analysis using the Intraclass correlation coefficient showed excellent agreement: 0.942 for reliability, 0.935 for usefulness, and 0.868 for GQS. While general informational questions received high scores, responses to treatment-specific and personalized inquiries required greater depth and comprehensiveness. Readability analysis indicated that ChatGPT's responses required at least a high school senior to college-level reading ability.

CONCLUSION

ChatGPT provides reliable, useful, and moderate quality information on scoliosis but has limitations in addressing treatment-specific and personalized inquiries. Caution is essential when using Artificial Intelligence (AI) in patient education and medical decision-making.

摘要

目的

本研究评估ChatGPT对有关脊柱侧弯常见问题的回答的可靠性、实用性、质量和可读性。

方法

通过对谷歌趋势数据和临床反馈的分析确定了16个常见问题,并将其呈现给ChatGPT进行评估。两名独立专家使用7点李克特量表评估回答的可靠性和实用性。此外,还使用全球质量量表(GQS)对整体质量进行评分。为了评估可读性,采用了各种既定指标,包括弗莱什易读性分数(FRE)、难词简易衡量指标(SMOG)指数、科尔曼-廖指数(CLI)、冈宁雾度指数(GFI)、弗莱什-金凯德年级水平(FKGL)、预测年级水平和自动可读性指数(ARI)。

结果

平均可靠性得分为4.68±0.73(中位数:5,四分位距4-5),而平均实用性得分为4.84±0.84(中位数:5,四分位距4-5)。此外,平均GQS得分为4.28±0.58(中位数:4,四分位距4-5)。使用组内相关系数进行的评分者间可靠性分析显示出极佳的一致性:可靠性为0.942,实用性为0.935,GQS为0.868。虽然一般信息性问题得分较高,但对特定治疗和个性化询问的回答需要更高的深度和全面性。可读性分析表明,ChatGPT的回答至少需要高中高年级到大学水平的阅读能力。

结论

ChatGPT提供了关于脊柱侧弯的可靠、有用且质量适中的信息,但在回答特定治疗和个性化询问方面存在局限性。在患者教育和医疗决策中使用人工智能(AI)时必须谨慎。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验