Arzu Ufuk, Gencer Batuhan
Department of Orthopedics and Traumatology, Marmara University Pendik Training and Research Hospital, 34890 Istanbul, Turkey.
Diagnostics (Basel). 2025 Jul 21;15(14):1834. doi: 10.3390/diagnostics15141834.
: The increased accessibility of information has resulted in a rise in patients trying to self-diagnose and opting for self-medication, either as a primary treatment or as a supplement to medical care. Our objective was to evaluate the reliability, comprehensibility, and readability of the responses provided by ChatGPT 4.0 when queried about the most prevalent orthopaedic problems, thus ascertaining the occurrence of misguidance and the necessity for an audit of the disseminated information. ChatGPT 4.0 was presented with 26 open-ended questions. The responses were evaluated by two observers using a Likert scale in the categories of diagnosis, recommendation, and referral. The scores from the responses were subjected to subgroup analysis according to the area of interest (AoI) and anatomical region. The readability and comprehensibility of the chatbot's responses were analyzed using the Flesch-Kincaid Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL). The majority of the responses were rated as either 'adequate' or 'excellent'. However, in the diagnosis category, a significant difference was found in the evaluation made according to the AoI ( = 0.007), which is attributed to trauma-related questions. No significant difference was identified in any other category. The mean FKGL score was 7.8 ± 1.267, and the mean FRES was 52.68 ± 8.6. The average estimated reading level required to understand the text was considered as "high school". ChatGPT 4.0 facilitates the self-diagnosis and self-treatment tendencies of patients with musculoskeletal disorders. However, it is imperative for patients to have a robust understanding of the limitations of chatbot-generated advice, particularly in trauma-related conditions.
信息获取的便利性增加,导致尝试自我诊断并选择自我治疗的患者增多,这些患者将自我诊断和自我治疗作为主要治疗方式或作为医疗护理的补充。我们的目的是评估ChatGPT 4.0在被问及最常见的骨科问题时所提供回答的可靠性、可理解性和可读性,从而确定是否存在误导情况以及对所传播信息进行审核的必要性。向ChatGPT 4.0提出了26个开放式问题。两名观察者使用李克特量表在诊断、建议和转诊类别中对回答进行评估。根据感兴趣的领域(AoI)和解剖区域对回答的分数进行亚组分析。使用弗莱什-金凯德阅读简易度得分(FRES)和弗莱什-金凯德年级水平(FKGL)分析聊天机器人回答的可读性和可理解性。大多数回答被评为“足够”或“优秀”。然而,在诊断类别中,根据感兴趣的领域进行的评估存在显著差异(=0.007),这归因于与创伤相关的问题。在任何其他类别中均未发现显著差异。平均FKGL得分为7.8±1.267,平均FRES为52.68±8.6。理解文本所需的平均估计阅读水平被认为是“高中”。ChatGPT 4.0促进了肌肉骨骼疾病患者的自我诊断和自我治疗倾向。然而,患者必须充分了解聊天机器人生成建议的局限性,尤其是在与创伤相关的情况下。