评估ChatGPT 4o和ChatGPT 4o mini在管理腰椎间盘突出症方面的临床支持能力。

Assessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniation.

作者信息

Wang Suning, Wang Ying, Jiang Linlin, Chang Yong, Zhang Shiji, Zhao Kun, Chen Lu, Gao Chunzheng

机构信息

Department of Orthopedics, The Second Hospital of Shandong University, Qilu Hospital of Shandong University, Shandong University, Jinan, 250000, China.

Shandong University, NO 44, Wenhuaxi Road, Jinan, 250012, China.

出版信息

Eur J Med Res. 2025 Jan 22;30(1):45. doi: 10.1186/s40001-025-02296-x.

DOI:10.1186/s40001-025-02296-x

PMID:39844276

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11753088/

Abstract

PURPOSE

This study evaluated and compared the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in diagnosing and treating lumbar disc herniation (LDH) with radiculopathy.

METHODS

Twenty-one questions (across 5 categories) from NASS Clinical Guidelines were input into ChatGPT 4o and ChatGPT 4o mini. Five orthopedic surgeons assessed their responses using a 5-point Likert scale for accuracy and completeness, and a 7-point scale for reliability. Flesch Reading Ease scores were calculated to assess readability. Additionally, ChatGPT 4o analyzed lumbar images from 53 patients, comparing its recognizable agreement with orthopedic surgeons using Kappa values.

RESULTS

Both models demonstrated strong clinical support capabilities with no significant differences in accuracy or reliability. However, ChatGPT 4o provided more comprehensive and consistent responses. The Flesch Reading Ease scores for both models indicated that their generated content was "very difficult to read," potentially limiting patient accessibility. In evaluating lumbar disc herniation images, ChatGPT 4o achieved an overall accuracy of 0.81, with LDH recognition precision, recall, and F1 scores exceeding 0.80. The AUC was 0.80, and the Kappa value was 0.61, indicating moderate agreement between the model's predictions and actual diagnoses, though with room for improvement.

CONCLUSION

While both models are effective, ChatGPT 4o offers more comprehensive clinical responses, making it more suitable for high-integrity medical tasks. However, the difficulty in reading AI-generated content and occasional use of misleading terms, such as "tumor," indicate a need for further improvements to reduce patient anxiety.

摘要

目的

本研究评估并比较了ChatGPT 4o和ChatGPT 4o mini在诊断和治疗伴神经根病的腰椎间盘突出症（LDH）方面的临床支持能力。

方法

将美国脊柱外科学会（NASS）临床指南中的21个问题（分为5类）输入ChatGPT 4o和ChatGPT 4o mini。五位骨科医生使用5分制李克特量表评估其回答的准确性和完整性，使用7分制量表评估其可靠性。计算弗莱什易读性分数以评估可读性。此外，ChatGPT 4o分析了53例患者的腰椎图像，并使用卡帕值比较其与骨科医生的可识别一致性。

结果

两种模型均显示出强大的临床支持能力，在准确性或可靠性方面无显著差异。然而，ChatGPT 4o提供了更全面和一致的回答。两种模型的弗莱什易读性分数表明，它们生成的内容“非常难读”，这可能会限制患者的理解。在评估腰椎间盘突出症图像时，ChatGPT 4o的总体准确率为0.81，LDH识别的精确率、召回率和F1分数均超过0.80。曲线下面积（AUC）为0.80，卡帕值为0.61，表明模型预测与实际诊断之间存在中度一致性，不过仍有改进空间。

结论

虽然两种模型都有效，但ChatGPT 4o提供了更全面的临床回答，使其更适合高完整性的医疗任务。然而，人工智能生成内容的难读性以及偶尔使用误导性术语（如“肿瘤”）表明需要进一步改进以减少患者的焦虑。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c9a/11753088/c51c85f064bb/40001_2025_2296_Fig1_HTML.jpg

相似文献

Assessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniation.评估ChatGPT 4o和ChatGPT 4o mini在管理腰椎间盘突出症方面的临床支持能力。

Eur J Med Res. 2025 Jan 22;30(1):45. doi: 10.1186/s40001-025-02296-x.

Comparative analysis of ChatGPT-4o mini, ChatGPT-4o and Gemini Advanced in the treatment of postmenopausal osteoporosis.ChatGPT-4o mini、ChatGPT-4o与Gemini Advanced在绝经后骨质疏松症治疗中的对比分析。

BMC Musculoskelet Disord. 2025 Apr 16;26(1):369. doi: 10.1186/s12891-025-08601-3.

Comparing AAOS appropriate use criteria with ChatGPT-4o recommendations on treating distal radius fractures.比较美国矫形外科医师学会（AAOS）关于治疗桡骨远端骨折的恰当使用标准与ChatGPT-4o的相关建议。

Hand Surg Rehabil. 2025 Apr;44(2):102122. doi: 10.1016/j.hansur.2025.102122. Epub 2025 Mar 11.

Assessing the feasibility of ChatGPT-4o and Claude 3-Opus in thyroid nodule classification based on ultrasound images.评估ChatGPT-4o和Claude 3-Opus基于超声图像进行甲状腺结节分类的可行性。

Endocrine. 2025 Mar;87(3):1041-1049. doi: 10.1007/s12020-024-04066-x. Epub 2024 Oct 11.

An analysis of ChatGPT recommendations for the diagnosis and treatment of cervical radiculopathy.对 ChatGPT 推荐的颈神经根病诊断和治疗方案的分析。

J Neurosurg Spine. 2024 Jun 28;41(3):385-395. doi: 10.3171/2024.4.SPINE231148. Print 2024 Sep 1.

Evaluating the Accuracy, Reliability, Consistency, and Readability of Different Large Language Models in Restorative Dentistry.评估不同大语言模型在口腔修复学中的准确性、可靠性、一致性和可读性。

J Esthet Restor Dent. 2025 Jul;37(7):1740-1752. doi: 10.1111/jerd.13447. Epub 2025 Mar 2.

High identification and positive-negative discrimination but limited detailed grading accuracy of ChatGPT-4o in knee osteoarthritis radiographs.ChatGPT-4o在膝关节骨关节炎X光片方面具有较高的识别能力和正负鉴别能力，但详细分级准确性有限。

Knee Surg Sports Traumatol Arthrosc. 2025 May;33(5):1911-1919. doi: 10.1002/ksa.12639. Epub 2025 Mar 7.

Lumbar disc herniation with radiculopathy: a comparison of NASS guidelines and ChatGPT.腰椎间盘突出症伴神经根病：美国神经外科医师协会（NASS）指南与ChatGPT的比较

N Am Spine Soc J. 2024 Jun 1;19:100333. doi: 10.1016/j.xnsj.2024.100333. eCollection 2024 Sep.

Evaluating the Accuracy and Readability of ChatGPT-4o's Responses to Patient-Based Questions about Keratoconus.评估ChatGPT-4o对圆锥角膜患者相关问题回答的准确性和可读性。

Ophthalmic Epidemiol. 2025 Mar 28:1-6. doi: 10.1080/09286586.2025.2484760.

Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o.人工智能模型在风湿病委员会级问题中的比较性能：评估 Google Gemini 和 ChatGPT-4o。

Clin Rheumatol. 2024 Nov;43(11):3507-3513. doi: 10.1007/s10067-024-07154-5. Epub 2024 Sep 28.

引用本文的文献

Artificial Intelligence Chatbots in Pediatric Emergencies: A Reliable Lifeline or a Risk?儿科急诊中的人工智能聊天机器人：可靠的生命线还是风险？

Cureus. 2025 Aug 1;17(8):e89234. doi: 10.7759/cureus.89234. eCollection 2025 Aug.

Potential of ChatGPT in youth mental health emergency triage: Comparative analysis with clinicians.ChatGPT在青少年心理健康紧急分诊中的潜力：与临床医生的比较分析

PCN Rep. 2025 Jul 15;4(3):e70159. doi: 10.1002/pcn5.70159. eCollection 2025 Sep.

Evaluating ChatGPT and DeepSeek in postdural puncture headache management: a comparative study with international consensus guidelines.评估ChatGPT和DeepSeek在硬膜穿刺后头痛管理中的应用：与国际共识指南的对比研究

BMC Neurol. 2025 Jul 1;25(1):264. doi: 10.1186/s12883-025-04280-8.

本文引用的文献

Lumbar disc herniation with radiculopathy: a comparison of NASS guidelines and ChatGPT.腰椎间盘突出症伴神经根病：美国神经外科医师协会（NASS）指南与ChatGPT的比较

N Am Spine Soc J. 2024 Jun 1;19:100333. doi: 10.1016/j.xnsj.2024.100333. eCollection 2024 Sep.

Assessing ChatGPT's theoretical knowledge and prescriptive accuracy in bacterial infections: a comparative study with infectious diseases residents and specialists.评估ChatGPT在细菌感染方面的理论知识和诊断准确性：与传染病住院医师和专科医生的比较研究。

Infection. 2024 Jul 12. doi: 10.1007/s15010-024-02350-6.

Quality and readability of online information and materials on post-surgery breast seroma.关于术后乳腺血清肿的在线信息及资料的质量与可读性

Br J Hosp Med (Lond). 2024 Jun 30;85(6):1-9. doi: 10.12968/hmed.2024.0058.

Assessing ChatGPT's Potential in HIV Prevention Communication: A Comprehensive Evaluation of Accuracy, Completeness, and Inclusivity.评估 ChatGPT 在 HIV 预防传播中的潜力：准确性、完整性和包容性的综合评估。

AIDS Behav. 2024 Aug;28(8):2746-2754. doi: 10.1007/s10461-024-04391-2. Epub 2024 Jun 5.

Embracing ChatGPT for Medical Education: Exploring Its Impact on Doctors and Medical Students.拥抱 ChatGPT 助力医学教育：探索其对医生和医学生的影响。

JMIR Med Educ. 2024 Apr 10;10:e52483. doi: 10.2196/52483.

Performance and Consistency of ChatGPT-4 Versus Otolaryngologists: A Clinical Case Series.ChatGPT-4与耳鼻喉科医生的表现及一致性：临床病例系列

Otolaryngol Head Neck Surg. 2024 Jun;170(6):1519-1526. doi: 10.1002/ohn.759. Epub 2024 Apr 9.

Lumbar disc herniation: Epidemiology, clinical and radiologic diagnosis WFNS spine committee recommendations.腰椎间盘突出症：流行病学、临床及影像学诊断——世界神经外科联合会脊柱委员会建议

World Neurosurg X. 2024 Feb 20;22:100279. doi: 10.1016/j.wnsx.2024.100279. eCollection 2024 Apr.

Use of ChatGPT for Determining Clinical and Surgical Treatment of Lumbar Disc Herniation With Radiculopathy: A North American Spine Society Guideline Comparison.使用ChatGPT确定伴神经根病的腰椎间盘突出症的临床和手术治疗：与北美脊柱协会指南的比较

Neurospine. 2024 Mar;21(1):149-158. doi: 10.14245/ns.2347052.526. Epub 2024 Jan 31.

Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study.评估 ChatGPT 在整个临床工作流程中的效用：开发和可用性研究。

J Med Internet Res. 2023 Aug 22;25:e48659. doi: 10.2196/48659.

Thromboembolic prophylaxis in spine surgery: an analysis of ChatGPT recommendations.脊柱手术中的血栓栓塞预防：对ChatGPT推荐意见的分析

Spine J. 2023 Nov;23(11):1684-1691. doi: 10.1016/j.spinee.2023.07.015. Epub 2023 Jul 25.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估ChatGPT 4o和ChatGPT 4o mini在管理腰椎间盘突出症方面的临床支持能力。

Assessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniation.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献