Richlitzki Cedric, Mansoorian Sina, Käsmann Lukas, Stoleriu Mircea Gabriel, Kovacs Julia, Sienel Wulf, Kauffmann-Guerrero Diego, Duell Thomas, Schmidt-Hegemann Nina Sophie, Belka Claus, Corradini Stefanie, Eze Chukwuka
Department of Radiation Oncology, University Hospital LMU, Marchioninistrasse 15, Munich, 81377, Germany, 49 89440073770.
Asklepios Lung Clinic Munich - Gauting, Division of Thoracic Surgery, LMU University Hospital, Munich, Germany.
JMIR Cancer. 2025 Aug 13;11:e69783. doi: 10.2196/69783.
BACKGROUND: Large language models (LLMs) such as ChatGPT (OpenAI) are increasingly discussed as potential tools for patient education in health care. In radiation oncology, where patients are often confronted with complex medical terminology and complex treatment plans, LLMs may support patient understanding and promote more active participation in care. However, the readability, accuracy, completeness, and overall acceptance of LLM-generated medical content remain underexplored. OBJECTIVE: This study aims to evaluate the potential of ChatGPT-4 as a supplementary tool for patient education in the context of lung cancer radiotherapy by assessing the readability, content quality, and perceived usefulness of artificial intelligence-generated responses from both clinician and patient perspectives. METHODS: A total of 8 frequently asked questions about radiotherapy for lung cancer were developed based on clinical experience from a team of clinicians specialized in lung cancer treatment at a university hospital. The questions were submitted individually to ChatGPT-4o (version as of July 2024) using the prompt: "I am a lung cancer patient looking for answers to the following questions." Responses were evaluated using three approaches: (1) a readability analysis applying the Modified Flesch Reading Ease (FRE) formula for German and the 4th Vienna Formula (WSTF); (2) a multicenter expert evaluation by 6 multidisciplinary clinicians (radiation oncologists, medical oncologists, and thoracic surgeons) specialized in lung cancer treatment using a 5-point Likert scale to assess relevance, correctness, and completeness; and (3) a patient evaluation during the first follow-up appointment after radiotherapy, assessing comprehensibility, accuracy, relevance, trustworthiness, and willingness to use ChatGPT for future medical questions. RESULTS: Readability analysis classified most responses as "very difficult to read" (university level) or "difficult to read" (upper secondary school), likely due to the use of medical language and long sentence structures. Clinician assessments yielded high scores for relevance (mean 4.5, SD 0.52) and correctness (mean 4.3, SD 0.65), but completeness received slightly lower ratings (mean 3.9, SD 0.59). A total of 30 patients rated the responses positively for clarity (mean 4.4, SD 0.61) and relevance (mean 4.3, SD 0.64), but lower for trustworthiness (mean 3.8, SD 0.68) and usability (mean 3.7, SD 0.73). No harmful misinformation was identified in the responses. CONCLUSIONS: ChatGPT-4 shows promise as a supplementary tool for patient education in radiation oncology. While patients and clinicians appreciated the clarity and relevance of the information, limitations in completeness, trust, and readability highlight the need for clinician oversight and further optimization of LLM-generated content. Future developments should focus on improving accessibility, integrating real-time readability adaptation, and establishing standardized evaluation frameworks to ensure safe and effective clinical use.
背景:诸如ChatGPT(OpenAI)之类的大语言模型(LLMs)作为医疗保健中患者教育的潜在工具,正受到越来越多的讨论。在放射肿瘤学领域,患者常常面临复杂的医学术语和复杂的治疗方案,大语言模型可能有助于患者理解,并促进其更积极地参与护理。然而,大语言模型生成的医学内容的可读性、准确性、完整性及整体可接受性仍未得到充分探索。 目的:本研究旨在通过从临床医生和患者的角度评估人工智能生成回复的可读性、内容质量和感知有用性,来评估ChatGPT-4作为肺癌放疗背景下患者教育辅助工具的潜力。 方法:根据某大学医院一组专门从事肺癌治疗的临床医生的临床经验,共提出了8个关于肺癌放疗的常见问题。使用提示语“我是一名肺癌患者,想寻求以下问题的答案”,将这些问题分别提交给ChatGPT-4o(截至2024年7月的版本)。回复采用三种方法进行评估:(1)应用德语的改良弗莱什阅读简易度(FRE)公式和第4版维也纳公式(WSTF)进行可读性分析;(2)由6名专门从事肺癌治疗的多学科临床医生(放射肿瘤学家、医学肿瘤学家和胸外科医生)进行多中心专家评估,使用5点李克特量表评估相关性、正确性和完整性;(3)在放疗后的首次随访预约期间进行患者评估,评估可理解性、准确性、相关性、可信度以及未来使用ChatGPT解答医疗问题的意愿。 结果:可读性分析将大多数回复归类为“非常难读”(大学水平)或“难读”(高中水平),这可能是由于使用了医学语言和长句子结构。临床医生的评估在相关性(平均4.5,标准差0.52)和正确性(平均4.3,标准差0.65)方面得分较高,但完整性得分略低(平均3.9,标准差0.59)。共有30名患者对回复的清晰度(平均4.4,标准差0.61)和相关性(平均4.3,标准差0.64)给予了积极评价,但对可信度(平均3.8,标准差0.68)和可用性(平均3.7,标准差0.73)的评价较低。回复中未发现有害的错误信息。 结论:ChatGPT-4显示出作为放射肿瘤学患者教育辅助工具的潜力。虽然患者和临床医生赞赏信息的清晰度和相关性,但完整性、信任度和可读性方面的局限性凸显了临床医生监督和进一步优化大语言模型生成内容的必要性。未来的发展应侧重于提高可及性、整合实时可读性调整以及建立标准化评估框架,以确保临床安全有效使用。
J Med Internet Res. 2025-7-22
Front Med (Lausanne). 2024-10-29
NEJM AI. 2024-5
Nature. 2024-8-9
Otolaryngol Head Neck Surg. 2024-12