Negrini Francesco, Malfitano Calogero, Ferriero Giorgio, Morone Giovanni, Negrini Alberto, Zaina Fabio, Ferrario Irene, Kiekens Carlotte, Negrini Stefano, Vitale Jacopo
Department of Biotechnology and Life Sciences, University of Insubria, Varese, Italy.
Institute of Tradate, Istituti Clinici Scientifici Maugeri IRCCS, Tradate, Italy.
Eur Spine J. 2025 Jul 21. doi: 10.1007/s00586-025-09166-4.
This study aimed to evaluate the scientific accuracy, content validity, and clarity of ChatGPT-4.0's responses on conservative management of idiopathic scoliosis. The research explored whether the model could effectively support patient education in an area where non-surgical treatment information is crucial.
Fourteen frequently asked questions (FAQs) regarding conservative scoliosis treatment were identified using a systematic, multi-step approach that combined web-based inquiry and expert input. Each question was submitted individually to ChatGPT-4.0 on December 6, 2024, using a standardized patient prompt ("I'm a scoliosis patient. Limit your answer to 150 words"). The generated responses were evaluated by a panel of 37 experts from a specialized spinal deformity center via an online survey using a 6-point Likert scale. Content validity was assessed using the Content Validity Ratio (CVR) and Content Validity Index (CVI), and inter-rater reliability was calculated with Fleiss' kappa. Experts also provided categorical feedback on reasons for any rating discrepancies.
Eleven out of 14 responses met the CVR threshold (≥ 0.38), yielding an overall CVI of 0.68. Three responses - addressing "What is scoliosis?", "Can exercises or physical therapy cure scoliosis?", "What is the best sport for scoliosis?"- showed lower validity (CVR scores: 0.37, 0.37, and - 0.58, respectively), primarily due to factual inaccuracies and insufficient detail. Clarity received the highest ratings (median = 6), while comprehensiveness, professionalism, and response length each had a median score of 5. Inter-rater reliability was slight (Fleiss' kappa = 0.10).
ChatGPT-4.0 generally provides clear and accessible information on conservative idiopathic scoliosis management, supporting its potential as a patient education tool. Nonetheless, variability in response accuracy and expert evaluation underscores the need for further refinement and expert supervision before wider clinical application.
本研究旨在评估ChatGPT-4.0对特发性脊柱侧凸保守治疗的回答的科学准确性、内容效度和清晰度。该研究探讨了该模型是否能在非手术治疗信息至关重要的领域有效支持患者教育。
采用系统的多步骤方法,结合基于网络的查询和专家意见,确定了14个关于脊柱侧凸保守治疗的常见问题。2024年12月6日,每个问题都使用标准化的患者提示语(“我是一名脊柱侧凸患者。将你的回答限制在150字以内”)单独提交给ChatGPT-4.0。来自一个专门的脊柱畸形中心的37名专家组成的小组通过在线调查,使用6点李克特量表对生成的回答进行评估。使用内容效度比(CVR)和内容效度指数(CVI)评估内容效度,并用Fleiss' kappa计算评分者间信度。专家们还就任何评分差异的原因提供了分类反馈。
14个回答中有11个达到了CVR阈值(≥0.38),总体CVI为0.68。三个回答——关于“什么是脊柱侧凸?”“运动或物理治疗能治愈脊柱侧凸吗?”“脊柱侧凸最好的运动是什么?”——显示出较低的效度(CVR分数分别为0.37、0.37和-0.58),主要是由于事实不准确和细节不足。清晰度获得了最高评分(中位数=6),而全面性、专业性和回答长度的中位数分数均为5。评分者间信度较低(Fleiss' kappa=0.10)。
ChatGPT-4.0通常能提供关于特发性脊柱侧凸保守治疗的清晰且易懂的信息,支持其作为患者教育工具的潜力。尽管如此,回答准确性和专家评估的变异性强调了在更广泛的临床应用之前需要进一步完善和专家监督。