• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估ChatGPT-4.0在特发性脊柱侧弯保守治疗中的准确性和潜力:关于清晰度、有效性及专家看法的初步研究

Evaluating ChatGPT-4.0's accuracy and potential in idiopathic scoliosis conservative treatment: a preliminary study on clarity, validity, and expert perceptions.

作者信息

Negrini Francesco, Malfitano Calogero, Ferriero Giorgio, Morone Giovanni, Negrini Alberto, Zaina Fabio, Ferrario Irene, Kiekens Carlotte, Negrini Stefano, Vitale Jacopo

机构信息

Department of Biotechnology and Life Sciences, University of Insubria, Varese, Italy.

Institute of Tradate, Istituti Clinici Scientifici Maugeri IRCCS, Tradate, Italy.

出版信息

Eur Spine J. 2025 Jul 21. doi: 10.1007/s00586-025-09166-4.

DOI:10.1007/s00586-025-09166-4
PMID:40689984
Abstract

PURPOSE

This study aimed to evaluate the scientific accuracy, content validity, and clarity of ChatGPT-4.0's responses on conservative management of idiopathic scoliosis. The research explored whether the model could effectively support patient education in an area where non-surgical treatment information is crucial.

METHODS

Fourteen frequently asked questions (FAQs) regarding conservative scoliosis treatment were identified using a systematic, multi-step approach that combined web-based inquiry and expert input. Each question was submitted individually to ChatGPT-4.0 on December 6, 2024, using a standardized patient prompt ("I'm a scoliosis patient. Limit your answer to 150 words"). The generated responses were evaluated by a panel of 37 experts from a specialized spinal deformity center via an online survey using a 6-point Likert scale. Content validity was assessed using the Content Validity Ratio (CVR) and Content Validity Index (CVI), and inter-rater reliability was calculated with Fleiss' kappa. Experts also provided categorical feedback on reasons for any rating discrepancies.

RESULTS

Eleven out of 14 responses met the CVR threshold (≥ 0.38), yielding an overall CVI of 0.68. Three responses - addressing "What is scoliosis?", "Can exercises or physical therapy cure scoliosis?", "What is the best sport for scoliosis?"- showed lower validity (CVR scores: 0.37, 0.37, and - 0.58, respectively), primarily due to factual inaccuracies and insufficient detail. Clarity received the highest ratings (median = 6), while comprehensiveness, professionalism, and response length each had a median score of 5. Inter-rater reliability was slight (Fleiss' kappa = 0.10).

CONCLUSION

ChatGPT-4.0 generally provides clear and accessible information on conservative idiopathic scoliosis management, supporting its potential as a patient education tool. Nonetheless, variability in response accuracy and expert evaluation underscores the need for further refinement and expert supervision before wider clinical application.

摘要

目的

本研究旨在评估ChatGPT-4.0对特发性脊柱侧凸保守治疗的回答的科学准确性、内容效度和清晰度。该研究探讨了该模型是否能在非手术治疗信息至关重要的领域有效支持患者教育。

方法

采用系统的多步骤方法,结合基于网络的查询和专家意见,确定了14个关于脊柱侧凸保守治疗的常见问题。2024年12月6日,每个问题都使用标准化的患者提示语(“我是一名脊柱侧凸患者。将你的回答限制在150字以内”)单独提交给ChatGPT-4.0。来自一个专门的脊柱畸形中心的37名专家组成的小组通过在线调查,使用6点李克特量表对生成的回答进行评估。使用内容效度比(CVR)和内容效度指数(CVI)评估内容效度,并用Fleiss' kappa计算评分者间信度。专家们还就任何评分差异的原因提供了分类反馈。

结果

14个回答中有11个达到了CVR阈值(≥0.38),总体CVI为0.68。三个回答——关于“什么是脊柱侧凸?”“运动或物理治疗能治愈脊柱侧凸吗?”“脊柱侧凸最好的运动是什么?”——显示出较低的效度(CVR分数分别为0.37、0.37和-0.58),主要是由于事实不准确和细节不足。清晰度获得了最高评分(中位数=6),而全面性、专业性和回答长度的中位数分数均为5。评分者间信度较低(Fleiss' kappa=0.10)。

结论

ChatGPT-4.0通常能提供关于特发性脊柱侧凸保守治疗的清晰且易懂的信息,支持其作为患者教育工具的潜力。尽管如此,回答准确性和专家评估的变异性强调了在更广泛的临床应用之前需要进一步完善和专家监督。

相似文献

1
Evaluating ChatGPT-4.0's accuracy and potential in idiopathic scoliosis conservative treatment: a preliminary study on clarity, validity, and expert perceptions.评估ChatGPT-4.0在特发性脊柱侧弯保守治疗中的准确性和潜力:关于清晰度、有效性及专家看法的初步研究
Eur Spine J. 2025 Jul 21. doi: 10.1007/s00586-025-09166-4.
2
Can Artificial Intelligence Improve the Readability of Patient Education Materials?人工智能能否提高患者教育材料的可读性?
Clin Orthop Relat Res. 2023 Nov 1;481(11):2260-2267. doi: 10.1097/CORR.0000000000002668. Epub 2023 Apr 28.
3
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
4
Potential of ChatGPT in youth mental health emergency triage: Comparative analysis with clinicians.ChatGPT在青少年心理健康紧急分诊中的潜力:与临床医生的比较分析
PCN Rep. 2025 Jul 15;4(3):e70159. doi: 10.1002/pcn5.70159. eCollection 2025 Sep.
5
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施:系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。
Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.
6
The measurement of collaboration within healthcare settings: a systematic review of measurement properties of instruments.医疗机构内协作的测量:对测量工具属性的系统评价
JBI Database System Rev Implement Rep. 2016 Apr;14(4):138-97. doi: 10.11124/JBISRIR-2016-2159.
7
[Volume and health outcomes: evidence from systematic reviews and from evaluation of Italian hospital data].[容量与健康结果:来自系统评价和意大利医院数据评估的证据]
Epidemiol Prev. 2013 Mar-Jun;37(2-3 Suppl 2):1-100.
8
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.
9
Is Information About Musculoskeletal Malignancies From Large Language Models or Web Resources at a Suitable Reading Level for Patients?来自大语言模型或网络资源的关于肌肉骨骼恶性肿瘤的信息对患者来说是否处于合适的阅读水平?
Clin Orthop Relat Res. 2025 Feb 1;483(2):306-315. doi: 10.1097/CORR.0000000000003263. Epub 2024 Sep 25.
10
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

本文引用的文献

1
Large language models in medicine.医学中的大型语言模型。
Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.
2
Implications of large language models such as ChatGPT for dental medicine.ChatGPT 等大型语言模型对牙科医学的影响。
J Esthet Restor Dent. 2023 Oct;35(7):1098-1102. doi: 10.1111/jerd.13046. Epub 2023 Apr 5.
3
Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine.GPT-4作为医学人工智能聊天机器人的益处、局限性和风险
N Engl J Med. 2023 Mar 30;388(13):1233-1239. doi: 10.1056/NEJMsr2214184.
4
Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma.评估 ChatGPT 在回答肝硬化和肝细胞癌相关问题方面的表现。
Clin Mol Hepatol. 2023 Jul;29(3):721-732. doi: 10.3350/cmh.2023.0089. Epub 2023 Mar 22.
5
Interrater reliability: the kappa statistic.组内一致性:kappa 统计量。
Biochem Med (Zagreb). 2012;22(3):276-82.