Fahy Stephen, Niemann Marcel, Böhm Peter, Winkler Tobias, Oehme Stephan
Center for Musculoskeletal Surgery, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, 10117 Berlin, Germany.
Deutsche Rheuma-Liga e.V., 53111 Bonn, Germany.
J Pers Med. 2024 May 8;14(5):495. doi: 10.3390/jpm14050495.
: This study aimed to evaluate the quality and readability of information generated by ChatGPT versions 3.5 and 4 concerning platelet-rich plasma (PRP) therapy in the management of knee osteoarthritis (OA), exploring whether large language models (LLMs) could play a significant role in patient education. A total of 23 common patient queries regarding the role of PRP therapy in knee OA management were presented to ChatGPT versions 3.5 and 4. The quality of the responses was assessed using the DISCERN criteria, and readability was evaluated using six established assessment tools. Both ChatGPT versions 3.5 and 4 produced moderate quality information. The quality of information provided by ChatGPT version 4 was significantly better than version 3.5, with mean DISCERN scores of 48.74 and 44.59, respectively. Both models scored highly with respect to response relevance and had a consistent emphasis on the importance of shared decision making. However, both versions produced content significantly above the recommended 8th grade reading level for patient education materials (PEMs), with mean reading grade levels (RGLs) of 17.18 for ChatGPT version 3.5 and 16.36 for ChatGPT version 4, indicating a potential barrier to their utility in patient education. While ChatGPT versions 3.5 and 4 both demonstrated the capability to generate information of moderate quality regarding the role of PRP therapy for knee OA, the readability of the content remains a significant barrier to widespread usage, exceeding the recommended reading levels for PEMs. Although ChatGPT version 4 showed improvements in quality and source citation, future iterations must focus on producing more accessible content to serve as a viable resource in patient education. Collaboration between healthcare providers, patient organizations, and AI developers is crucial to ensure the generation of high quality, peer reviewed, and easily understandable information that supports informed healthcare decisions.
本研究旨在评估ChatGPT 3.5和4版本生成的关于富血小板血浆(PRP)疗法在膝关节骨关节炎(OA)管理中的信息质量和可读性,探讨大语言模型(LLMs)是否能在患者教育中发挥重要作用。向ChatGPT 3.5和4版本提出了总共23个关于PRP疗法在膝关节OA管理中的常见患者问题。使用DISCERN标准评估回答的质量,并使用六种既定的评估工具评估可读性。ChatGPT 3.5和4版本都产生了质量中等的信息。ChatGPT 4版本提供的信息质量明显优于3.5版本,平均DISCERN分数分别为48.74和44.59。两个模型在回答相关性方面得分都很高,并且一致强调共同决策的重要性。然而,两个版本生成的内容明显高于患者教育材料(PEMs)推荐的八年级阅读水平,ChatGPT 3.5版本的平均阅读年级水平(RGLs)为17.18,ChatGPT 4版本为16.36,这表明它们在患者教育中的实用性存在潜在障碍。虽然ChatGPT 3.5和4版本都展示了生成关于PRP疗法对膝关节OA作用的中等质量信息的能力,但内容的可读性仍然是广泛使用的重大障碍,超过了PEMs的推荐阅读水平。尽管ChatGPT 4版本在质量和来源引用方面有所改进,但未来的迭代必须专注于生成更易理解的内容,以作为患者教育中的可行资源。医疗保健提供者、患者组织和人工智能开发者之间的合作对于确保生成高质量、经过同行评审且易于理解的信息以支持明智的医疗决策至关重要。