Trapp Christian, Schmidt-Hegemann Nina, Keilholz Michael, Brose Sarah Frederike, Marschner Sebastian N, Schönecker Stephan, Maier Sebastian H, Dehelean Diana-Coralia, Rottler Maya, Konnerth Dinah, Belka Claus, Corradini Stefanie, Rogowski Paul
Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany.
Bavarian Cancer Research Center (BZKF), Munich, Germany.
Strahlenther Onkol. 2025 Mar;201(3):333-342. doi: 10.1007/s00066-024-02342-3. Epub 2025 Jan 10.
This study aims to evaluate the capabilities and limitations of large language models (LLMs) for providing patient education for men undergoing radiotherapy for localized prostate cancer, incorporating assessments from both clinicians and patients.
Six questions about definitive radiotherapy for prostate cancer were designed based on common patient inquiries. These questions were presented to different LLMs [ChatGPT‑4, ChatGPT-4o (both OpenAI Inc., San Francisco, CA, USA), Gemini (Google LLC, Mountain View, CA, USA), Copilot (Microsoft Corp., Redmond, WA, USA), and Claude (Anthropic PBC, San Francisco, CA, USA)] via the respective web interfaces. Responses were evaluated for readability using the Flesch Reading Ease Index. Five radiation oncologists assessed the responses for relevance, correctness, and completeness using a five-point Likert scale. Additionally, 35 prostate cancer patients evaluated the responses from ChatGPT‑4 for comprehensibility, accuracy, relevance, trustworthiness, and overall informativeness.
The Flesch Reading Ease Index indicated that the responses from all LLMs were relatively difficult to understand. All LLMs provided answers that clinicians found to be generally relevant and correct. The answers from ChatGPT‑4, ChatGPT-4o, and Claude AI were also found to be complete. However, we found significant differences between the performance of different LLMs regarding relevance and completeness. Some answers lacked detail or contained inaccuracies. Patients perceived the information as easy to understand and relevant, with most expressing confidence in the information and a willingness to use ChatGPT‑4 for future medical questions. ChatGPT-4's responses helped patients feel better informed, despite the initially standardized information provided.
Overall, LLMs show promise as a tool for patient education in prostate cancer radiotherapy. While improvements are needed in terms of accuracy and readability, positive feedback from clinicians and patients suggests that LLMs can enhance patient understanding and engagement. Further research is essential to fully realize the potential of artificial intelligence in patient education.
本研究旨在评估大语言模型(LLMs)为接受局限性前列腺癌放疗的男性提供患者教育的能力和局限性,纳入临床医生和患者的评估。
基于常见的患者询问,设计了六个关于前列腺癌确定性放疗的问题。这些问题通过各自的网络界面呈现给不同的大语言模型[ChatGPT-4、ChatGPT-4o(均来自美国加利福尼亚州旧金山的OpenAI公司)、Gemini(美国加利福尼亚州山景城的谷歌有限责任公司)、Copilot(美国华盛顿州雷德蒙德的微软公司)和Claude(美国加利福尼亚州旧金山的Anthropic PBC)]。使用弗莱什易读性指数评估回答的可读性。五位放射肿瘤学家使用五点李克特量表评估回答的相关性、正确性和完整性。此外,35名前列腺癌患者评估了ChatGPT-4回答的可理解性、准确性、相关性、可信度和总体信息量。
弗莱什易读性指数表明,所有大语言模型的回答相对难以理解。所有大语言模型提供的答案临床医生普遍认为相关且正确。ChatGPT-4、ChatGPT-4o和Claude AI的答案也被认为是完整的。然而,我们发现不同大语言模型在相关性和完整性方面的表现存在显著差异。一些答案缺乏细节或包含不准确之处。患者认为这些信息易于理解且相关,大多数人对信息表示信任,并愿意在未来的医疗问题中使用ChatGPT-4。尽管最初提供的是标准化信息,但ChatGPT-4的回答帮助患者感觉了解得更多。
总体而言,大语言模型显示出作为前列腺癌放疗患者教育工具的潜力。虽然在准确性和可读性方面需要改进,但临床医生和患者的积极反馈表明,大语言模型可以增强患者的理解和参与度。进一步的研究对于充分实现人工智能在患者教育中的潜力至关重要。