Fahy Stephen, Oehme Stephan, Milinkovic Danko Dan, Bartek Benjamin
Centrum für Muskuloskeletale Chirurgie, Charité Universitätsmedizin Berlin, Berlin, Germany.
Front Digit Health. 2025 Jan 3;6:1480381. doi: 10.3389/fdgth.2024.1480381. eCollection 2024.
Knee osteoarthritis (OA) significantly impacts the quality of life of those afflicted, with many patients eventually requiring surgical intervention. While Total Knee Arthroplasty (TKA) is common, it may not be suitable for younger patients with unicompartmental OA, who might benefit more from High Tibial Osteotomy (HTO). Effective patient education is crucial for informed decision-making, yet most online health information has been found to be too complex for the average patient to understand. AI tools like ChatGPT may offer a solution, but their outputs often exceed the public's literacy level. This study assessed whether a customised ChatGPT could be utilized to improve readability and source accuracy in patient education on Knee OA and tibial osteotomy.
Commonly asked questions about HTO were gathered using Google's "People Also Asked" feature and formatted to an 8th-grade reading level. Two ChatGPT-4 models were compared: a native version and a fine-tuned model ("The Knee Guide") optimized for readability and source citation through Instruction-Based Fine-Tuning (IBFT) and Reinforcement Learning from Human Feedback (RLHF). The responses were evaluated for quality using the DISCERN criteria and readability using the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL).
The native ChatGPT-4 model scored a mean DISCERN score of 38.41 (range 25-46), indicating poor quality, while "The Knee Guide" scored 45.9 (range 33-66), indicating moderate quality. Cronbach's Alpha was 0.86, indicating good interrater reliability. "The Knee Guide" achieved better readability with a mean FKGL of 8.2 (range 5-10.7, ±1.42) and a mean FRES of 60 (range 47-76, ±7.83), compared to the native model's FKGL of 13.9 (range 11-16, ±1.39) and FRES of 32 (range 14-47, ±8.3). These differences were statistically significant ( < 0.001).
Fine-tuning ChatGPT significantly improved the readability and quality of HTO-related information. "The Knee Guide" demonstrated the potential of customized AI tools in enhancing patient education by making complex medical information more accessible and understandable.
膝关节骨关节炎(OA)严重影响患者的生活质量,许多患者最终需要手术干预。全膝关节置换术(TKA)虽然常见,但可能不适用于患有单间室OA的年轻患者,他们可能从高位胫骨截骨术(HTO)中获益更多。有效的患者教育对于做出明智的决策至关重要,但大多数在线健康信息被发现对于普通患者来说过于复杂而难以理解。像ChatGPT这样的人工智能工具可能提供一种解决方案,但其输出内容往往超出公众的读写水平。本研究评估了定制的ChatGPT是否可用于提高膝关节OA和胫骨截骨术患者教育内容的可读性和信息来源准确性。
利用谷歌的“相关问题”功能收集有关HTO的常见问题,并将其格式调整为八年级阅读水平。比较了两个ChatGPT-4模型:一个原始版本和一个通过基于指令的微调(IBFT)和人类反馈强化学习(RLHF)针对可读性和来源引用进行优化的微调模型(“膝关节指南”)。使用DISCERN标准评估回答的质量,并使用弗莱什阅读简易度得分(FRES)和弗莱什-金凯德年级水平(FKGL)评估可读性。
原始的ChatGPT-4模型的平均DISCERN得分为38.41(范围为25-46),表明质量较差,而“膝关节指南”的得分为45.9(范围为33-66),表明质量中等。克朗巴哈系数为0.86,表明评分者间信度良好。与原始模型的FKGL为13.9(范围为11-16,±1.39)和FRES为32(范围为14-47,±8.3)相比,“膝关节指南”的可读性更好,平均FKGL为8.2(范围为5-10.7,±1.42),平均FRES为60(范围为47-76,±7.83)。这些差异具有统计学意义(<0.001)。
对ChatGPT进行微调显著提高了与HTO相关信息的可读性和质量。“膝关节指南”展示了定制人工智能工具在通过使复杂的医学信息更易获取和理解来加强患者教育方面的潜力。