Suppr超能文献

评估ChatGPT 4o和ChatGPT 4o mini在管理腰椎间盘突出症方面的临床支持能力。

Assessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniation.

作者信息

Wang Suning, Wang Ying, Jiang Linlin, Chang Yong, Zhang Shiji, Zhao Kun, Chen Lu, Gao Chunzheng

机构信息

Department of Orthopedics, The Second Hospital of Shandong University, Qilu Hospital of Shandong University, Shandong University, Jinan, 250000, China.

Shandong University, NO 44, Wenhuaxi Road, Jinan, 250012, China.

出版信息

Eur J Med Res. 2025 Jan 22;30(1):45. doi: 10.1186/s40001-025-02296-x.

Abstract

PURPOSE

This study evaluated and compared the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in diagnosing and treating lumbar disc herniation (LDH) with radiculopathy.

METHODS

Twenty-one questions (across 5 categories) from NASS Clinical Guidelines were input into ChatGPT 4o and ChatGPT 4o mini. Five orthopedic surgeons assessed their responses using a 5-point Likert scale for accuracy and completeness, and a 7-point scale for reliability. Flesch Reading Ease scores were calculated to assess readability. Additionally, ChatGPT 4o analyzed lumbar images from 53 patients, comparing its recognizable agreement with orthopedic surgeons using Kappa values.

RESULTS

Both models demonstrated strong clinical support capabilities with no significant differences in accuracy or reliability. However, ChatGPT 4o provided more comprehensive and consistent responses. The Flesch Reading Ease scores for both models indicated that their generated content was "very difficult to read," potentially limiting patient accessibility. In evaluating lumbar disc herniation images, ChatGPT 4o achieved an overall accuracy of 0.81, with LDH recognition precision, recall, and F1 scores exceeding 0.80. The AUC was 0.80, and the Kappa value was 0.61, indicating moderate agreement between the model's predictions and actual diagnoses, though with room for improvement.

CONCLUSION

While both models are effective, ChatGPT 4o offers more comprehensive clinical responses, making it more suitable for high-integrity medical tasks. However, the difficulty in reading AI-generated content and occasional use of misleading terms, such as "tumor," indicate a need for further improvements to reduce patient anxiety.

摘要

目的

本研究评估并比较了ChatGPT 4o和ChatGPT 4o mini在诊断和治疗伴神经根病的腰椎间盘突出症(LDH)方面的临床支持能力。

方法

将美国脊柱外科学会(NASS)临床指南中的21个问题(分为5类)输入ChatGPT 4o和ChatGPT 4o mini。五位骨科医生使用5分制李克特量表评估其回答的准确性和完整性,使用7分制量表评估其可靠性。计算弗莱什易读性分数以评估可读性。此外,ChatGPT 4o分析了53例患者的腰椎图像,并使用卡帕值比较其与骨科医生的可识别一致性。

结果

两种模型均显示出强大的临床支持能力,在准确性或可靠性方面无显著差异。然而,ChatGPT 4o提供了更全面和一致的回答。两种模型的弗莱什易读性分数表明,它们生成的内容“非常难读”,这可能会限制患者的理解。在评估腰椎间盘突出症图像时,ChatGPT 4o的总体准确率为0.81,LDH识别的精确率、召回率和F1分数均超过0.80。曲线下面积(AUC)为0.80,卡帕值为0.61,表明模型预测与实际诊断之间存在中度一致性,不过仍有改进空间。

结论

虽然两种模型都有效,但ChatGPT 4o提供了更全面的临床回答,使其更适合高完整性的医疗任务。然而,人工智能生成内容的难读性以及偶尔使用误导性术语(如“肿瘤”)表明需要进一步改进以减少患者的焦虑。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c9a/11753088/c51c85f064bb/40001_2025_2296_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验