Bhatia Divya, Kim Michael S, Romoff Melissa, Timm Asha, Mills Emily, Wu Hao-Hua, Hashmi Sohaib, Park Don, Lee Yu-Po
Department of Research, Palos Verdes High School, Palos Verdes Estates, USA.
Department of Orthopedic Surgery, University of California, Irvine, School of Medicine, Orange, USA.
Cureus. 2025 Jul 26;17(7):e88808. doi: 10.7759/cureus.88808. eCollection 2025 Jul.
Patients increasingly turn to large language models (LLMs) and social media platforms for medical advice. The accuracy of these sources, particularly compared to peer-reviewed clinical practice guidelines, remains poorly characterized.
This cross-sectional study evaluated the perceived accuracy of spine-related medical advice generated by ChatGPT (ChatGPT (OpenAI, powered by GPT-4, San Francisco, CA, USA), TikTok (Los Angeles, CA, USA), and the North American Spine Society (NASS) clinical practice guidelines. Medical advice for four spine pathologies was collected from each source. Sixteen orthopedic surgeons rated the accuracy of excerpted recommendations on a 10-point Likert scale. Descriptive statistics summarized mean ratings and standard deviations.
For lumbar stenosis, mean (±SD) accuracy scores were 7.75 ± 2.11 for ChatGPT, 7.00 ± 1.80 for NASS, and 2.50 ± 1.54 for TikTok. For lumbar spondylolisthesis, scores were 7.56 ± 1.50 for ChatGPT, 5.94 ± 2.63 for NASS, and 5.31 ± 2.49 for TikTok. For lumbar disc herniation with radiculopathy, scores were 7.25 ± 2.13 on ChatGPT, 7.06 ± 1.55 on NASS, and 6.44 ± 2.03 on TikTok. For cervical radiculopathy, scores were 7.13 ± 1.38 for ChatGPT, 4.00 ± 2.44 for NASS, and 6.50 ± 2.12 for TikTok.
ChatGPT-generated outputs received the highest ratings for perceived accuracy. NASS guidelines, while evidence-based and peer-reviewed, remain inaccessible to most patients. Professional societies may consider adapting guideline content for dissemination via widely used digital platforms to improve public education and reduce misinformation.
患者越来越多地转向大型语言模型和社交媒体平台寻求医疗建议。这些信息来源的准确性,尤其是与经过同行评审的临床实践指南相比,仍缺乏充分的特征描述。
这项横断面研究评估了ChatGPT(ChatGPT(OpenAI,由GPT-4驱动,美国加利福尼亚州旧金山)、TikTok(美国加利福尼亚州洛杉矶)以及北美脊柱协会(NASS)临床实践指南所提供的脊柱相关医疗建议的感知准确性。从每个来源收集了四种脊柱疾病的医疗建议。16名骨科医生根据10分制李克特量表对摘录建议的准确性进行评分。描述性统计总结了平均评分和标准差。
对于腰椎管狭窄症,ChatGPT的平均(±标准差)准确性评分为7.75±2.11,NASS为7.00±1.80,TikTok为2.50±1.54。对于腰椎滑脱症,ChatGPT的评分为7.56±1.50,NASS为5.94±2.63,TikTok为5.31±2.49。对于伴有神经根病的腰椎间盘突出症,ChatGPT的评分为7.25±2.13,NASS为7.06±1.55,TikTok为6.44±2.03。对于神经根型颈椎病,ChatGPT的评分为7.13±1.38,NASS为4.00±2.44,TikTok为6.50±2.12。
ChatGPT生成的内容在感知准确性方面获得了最高评分。NASS指南虽然基于证据且经过同行评审,但大多数患者仍然无法获取。专业协会可能会考虑调整指南内容,以便通过广泛使用的数字平台进行传播,以改善公众教育并减少错误信息。