自我治疗还是不自我治疗：评估ChatGPT对最常见肌肉骨骼疾病的诊断、咨询及转诊建议的有效性

To Self-Treat or Not to Self-Treat: Evaluating the Diagnostic, Advisory and Referral Effectiveness of ChatGPT Responses to the Most Common Musculoskeletal Disorders.

作者信息

Arzu Ufuk, Gencer Batuhan

机构信息

Department of Orthopedics and Traumatology, Marmara University Pendik Training and Research Hospital, 34890 Istanbul, Turkey.

出版信息

Diagnostics (Basel). 2025 Jul 21;15(14):1834. doi: 10.3390/diagnostics15141834.

DOI:10.3390/diagnostics15141834

PMID:40722583

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12293552/

Abstract

: The increased accessibility of information has resulted in a rise in patients trying to self-diagnose and opting for self-medication, either as a primary treatment or as a supplement to medical care. Our objective was to evaluate the reliability, comprehensibility, and readability of the responses provided by ChatGPT 4.0 when queried about the most prevalent orthopaedic problems, thus ascertaining the occurrence of misguidance and the necessity for an audit of the disseminated information. ChatGPT 4.0 was presented with 26 open-ended questions. The responses were evaluated by two observers using a Likert scale in the categories of diagnosis, recommendation, and referral. The scores from the responses were subjected to subgroup analysis according to the area of interest (AoI) and anatomical region. The readability and comprehensibility of the chatbot's responses were analyzed using the Flesch-Kincaid Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL). The majority of the responses were rated as either 'adequate' or 'excellent'. However, in the diagnosis category, a significant difference was found in the evaluation made according to the AoI ( = 0.007), which is attributed to trauma-related questions. No significant difference was identified in any other category. The mean FKGL score was 7.8 ± 1.267, and the mean FRES was 52.68 ± 8.6. The average estimated reading level required to understand the text was considered as "high school". ChatGPT 4.0 facilitates the self-diagnosis and self-treatment tendencies of patients with musculoskeletal disorders. However, it is imperative for patients to have a robust understanding of the limitations of chatbot-generated advice, particularly in trauma-related conditions.

摘要

信息获取的便利性增加，导致尝试自我诊断并选择自我治疗的患者增多，这些患者将自我诊断和自我治疗作为主要治疗方式或作为医疗护理的补充。我们的目的是评估ChatGPT 4.0在被问及最常见的骨科问题时所提供回答的可靠性、可理解性和可读性，从而确定是否存在误导情况以及对所传播信息进行审核的必要性。向ChatGPT 4.0提出了26个开放式问题。两名观察者使用李克特量表在诊断、建议和转诊类别中对回答进行评估。根据感兴趣的领域（AoI）和解剖区域对回答的分数进行亚组分析。使用弗莱什-金凯德阅读简易度得分（FRES）和弗莱什-金凯德年级水平（FKGL）分析聊天机器人回答的可读性和可理解性。大多数回答被评为“足够”或“优秀”。然而，在诊断类别中，根据感兴趣的领域进行的评估存在显著差异（=0.007），这归因于与创伤相关的问题。在任何其他类别中均未发现显著差异。平均FKGL得分为7.8±1.267，平均FRES为52.68±8.6。理解文本所需的平均估计阅读水平被认为是“高中”。ChatGPT 4.0促进了肌肉骨骼疾病患者的自我诊断和自我治疗倾向。然而，患者必须充分了解聊天机器人生成建议的局限性，尤其是在与创伤相关的情况下。

相似文献

To Self-Treat or Not to Self-Treat: Evaluating the Diagnostic, Advisory and Referral Effectiveness of ChatGPT Responses to the Most Common Musculoskeletal Disorders.自我治疗还是不自我治疗：评估ChatGPT对最常见肌肉骨骼疾病的诊断、咨询及转诊建议的有效性

Diagnostics (Basel). 2025 Jul 21;15(14):1834. doi: 10.3390/diagnostics15141834.

Is Information About Musculoskeletal Malignancies From Large Language Models or Web Resources at a Suitable Reading Level for Patients?来自大语言模型或网络资源的关于肌肉骨骼恶性肿瘤的信息对患者来说是否处于合适的阅读水平？

Clin Orthop Relat Res. 2025 Feb 1;483(2):306-315. doi: 10.1097/CORR.0000000000003263. Epub 2024 Sep 25.

ChatGPT 3.5 Better Improves Comprehensibility of English, than Spanish, Generated Responses to Osteosarcoma Questions.ChatGPT 3.5生成的骨肉瘤问题回答中，对英语的理解提升效果优于西班牙语。

J Surg Oncol. 2025 Jun;131(8):1692-1695. doi: 10.1002/jso.28109. Epub 2025 Feb 3.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Using Artificial Intelligence ChatGPT to Access Medical Information about Chemical Eye Injuries: A Comparative Study.使用人工智能ChatGPT获取有关化学性眼外伤的医学信息：一项比较研究。

JMIR Form Res. 2025 Jun 30. doi: 10.2196/73642.

Evaluation of ChatGPT-4 as an Online Outpatient Assistant in Puerperal Mastitis Management: Content Analysis of an Observational Study.评估ChatGPT-4作为产褥期乳腺炎管理在线门诊助手的效果：一项观察性研究的内容分析

JMIR Med Inform. 2025 Jul 24;13:e68980. doi: 10.2196/68980.

Can Artificial Intelligence Improve the Readability of Patient Education Materials?人工智能能否提高患者教育材料的可读性？

Clin Orthop Relat Res. 2023 Nov 1;481(11):2260-2267. doi: 10.1097/CORR.0000000000002668. Epub 2023 Apr 28.

Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis.评估ChatGPT、Gemini和Perplexity针对强直性脊柱炎最常见问题生成的回答的可读性、质量和可靠性。

PLoS One. 2025 Jun 18;20(6):e0326351. doi: 10.1371/journal.pone.0326351. eCollection 2025.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病：网络荟萃分析。

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.外周动脉疾病教育中的人工智能：ChatGPT与谷歌Gemini的较量

Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.

本文引用的文献

A cross-sectional study on ChatGPT's alignment with clinical practice guidelines in musculoskeletal rehabilitation.一项关于ChatGPT与肌肉骨骼康复临床实践指南一致性的横断面研究。

BMC Musculoskelet Disord. 2025 Apr 24;26(1):411. doi: 10.1186/s12891-025-08650-8.

Comparing ChatGPT 3.5 and 4.0 in Low Back Pain Patient Education: Addressing Strengths, Limitations, and Psychosocial Challenges.比较ChatGPT 3.5和4.0在腰痛患者教育中的应用：探讨优势、局限性及心理社会挑战

World Neurosurg. 2025 Apr;196:123755. doi: 10.1016/j.wneu.2025.123755. Epub 2025 Mar 6.

ChatGPT is an Unreliable Source of Peer-Reviewed Information for Common Total Knee and Hip Arthroplasty Patient Questions.对于全膝关节置换术和全髋关节置换术患者常见问题，ChatGPT是不可靠的同行评审信息来源。

Adv Orthop. 2025 Jan 6;2025:5534704. doi: 10.1155/aort/5534704. eCollection 2025.

Impact of artificial intelligence in managing musculoskeletal pathologies in physiatry: a qualitative observational study evaluating the potential use of ChatGPT versus Copilot for patient information and clinical advice on low back pain.人工智能在物理医学中管理肌肉骨骼疾病的影响：一项定性观察性研究，评估ChatGPT与Copilot在提供腰痛患者信息和临床建议方面的潜在用途。

J Yeungnam Med Sci. 2025;42:11. doi: 10.12701/jyms.2024.01151. Epub 2024 Nov 29.

A Performance Evaluation of Large Language Models in Keratoconus: A Comparative Study of ChatGPT-3.5, ChatGPT-4.0, Gemini, Copilot, Chatsonic, and Perplexity.大语言模型在圆锥角膜中的性能评估：ChatGPT-3.5、ChatGPT-4.0、Gemini、Copilot、Chatsonic和Perplexity的比较研究

J Clin Med. 2024 Oct 30;13(21):6512. doi: 10.3390/jcm13216512.

Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care.评估 ChatGPT®、BARD®、 Gemini®、Copilot®、Perplexity® 在姑息治疗方面的可读性、可靠性和质量。

Medicine (Baltimore). 2024 Aug 16;103(33):e39305. doi: 10.1097/MD.0000000000039305.

Evaluating ChatGPT responses to frequently asked patient questions regarding periprosthetic joint infection after total hip and knee arthroplasty.评估ChatGPT对全髋关节和膝关节置换术后假体周围关节感染常见患者问题的回答。

Digit Health. 2024 Aug 9;10:20552076241272620. doi: 10.1177/20552076241272620. eCollection 2024 Jan-Dec.

Google Bard and ChatGPT in Orthopedics: Which Is the Better Doctor in Sports Medicine and Pediatric Orthopedics? The Role of AI in Patient Education.骨科领域的谷歌巴德和ChatGPT：在运动医学和小儿骨科方面，哪个是更出色的“医生”？人工智能在患者教育中的作用。

Diagnostics (Basel). 2024 Jun 13;14(12):1253. doi: 10.3390/diagnostics14121253.

The Large Language Model ChatGPT-4 Exhibits Excellent Triage Capabilities and Diagnostic Performance for Patients Presenting With Various Causes of Knee Pain.大型语言模型ChatGPT-4对因各种原因导致膝关节疼痛的患者表现出出色的分诊能力和诊断性能。

Arthroscopy. 2025 May;41(5):1438-1447.e14. doi: 10.1016/j.arthro.2024.06.021. Epub 2024 Jun 24.

Application of ChatGPT for Orthopedic Surgeries and Patient Care.ChatGPT 在骨科手术和患者护理中的应用。

Clin Orthop Surg. 2024 Jun;16(3):347-356. doi: 10.4055/cios23181. Epub 2024 May 13.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验