Boie Sebastian Daniel, Glastetter Esther, Lux Michael Patrick, Balzer Felix, von Kalle Christof, Lenz Christian, Müller Ulrike
Pfizer Pharma GmbH, Friedrichstr. 110, Berlin, 10117, Germany, 49 15152377580.
Department for Gynecology and Obstetrics, St. Louise Women's Hospital, Paderborn, St. Josefs Women's Hospital, Salzkotten, St. Vincenz Clinics, Salzkotten + Paderborn, Germany.
JMIR Cancer. 2025 Aug 13;11:e68426. doi: 10.2196/68426.
BACKGROUND: Patients with breast cancer frequently experience significant uncertainty, prompting them to seek detailed, personalized, and reliable medical information to enhance adherence to prescribed treatments, medications, and recommended lifestyle adjustments. Although high-quality information exists within oncology guidelines and patient-oriented resources, the provision of tailored responses to individual patient queries remains challenging, especially for non-English-speaking populations. OBJECTIVE: This study aims to evaluate the potential of an artificial intelligence-driven chatbot, specifically leveraging ChatGPT (GPT-4; OpenAI) combined with retrieval-augmented generation, to deliver personalized answers to complex breast cancer-related patient questions in German. METHODS: We collaborated with one of Germany's largest breast cancer Patient Representation Groups to collect authentic patient inquiries, receiving a total of 118 questions. After initial screening, we selected 104 medical questions, organized into 7 distinct categories: aftercare, bone health, ductal carcinoma in situ, diagnostics, nutrition and supplements, complementary medicine, and therapy. A customized version of GPT-4 was configured with specific system prompts emphasizing empathetic, evidence-based responses and integrated with a comprehensive database comprising guidelines, recommendations, and patient information materials published by recognized German medical societies. To assess chatbot responses, we used 4 evaluation criteria: comprehensibility (clarity from a patient perspective), correctness (accuracy per current medical guidelines), completeness (inclusion of all relevant aspects), and potential harm (risk of undue patient harm or misinformation). Ratings were conducted using a 5-point Likert scale by a breast cancer expert (correctness, completeness, and potential harm) and patient representatives (comprehensibility). RESULTS: The chatbot provided high-quality responses across multiple dimensions. Of the 499 responses evaluated for comprehensibility, 427 (85.6%) were rated as comprehensible. Among the 104 responses assessed for the remaining dimensions, 91 (87.5%) were rated as correct, 72 (69.2%) as complete, and 93 (89.4%) as nonharmful. Reasons for incomplete answers included omission of reimbursement details, updates from recent therapeutic guidelines, or nuanced recommendations regarding endocrine therapy and aftercare schedules. In addition, 6 (5.8%) of the answers were rated as potentially harmful due to outdated or contextually inappropriate recommendations. The chatbot also performed well in the nutrition and bone health categories despite occasionally incomplete document retrieval. CONCLUSIONS: Our findings demonstrate that an artificial intelligence-powered chatbot with GPT-4 and retrieval augmentation can effectively provide personalized, linguistically accessible, and largely accurate information to German-speaking patients with breast cancer. This approach holds considerable promise for improving patient-centered communication, empowering patients to make informed decisions. Nonetheless, observed limitations regarding response completeness and potential harm underscore the critical need for ongoing human oversight. Future research and development should prioritize regularly updated databases, advanced retrieval methods to handle complex document structures, multimodal capabilities, and clearly articulated disclaimers emphasizing the necessity of professional medical consultation. Our evaluation, along with the provided set of realistic patient questions, establishes a benchmark for future development and validation of German-language oncology chatbots.
背景:乳腺癌患者常常面临巨大的不确定性,这促使他们寻求详细、个性化且可靠的医疗信息,以提高对规定治疗、药物及推荐生活方式调整的依从性。尽管肿瘤学指南和面向患者的资源中存在高质量信息,但针对个体患者的疑问提供量身定制的回复仍具有挑战性,尤其是对于非英语人群。 目的:本研究旨在评估人工智能驱动的聊天机器人的潜力,具体而言,是利用ChatGPT(GPT - 4;OpenAI)结合检索增强生成技术,为德语区乳腺癌患者关于复杂乳腺癌相关问题提供个性化答案。 方法:我们与德国最大的乳腺癌患者代表组织之一合作,收集真实的患者疑问,共收到118个问题。经过初步筛选,我们选取了104个医学问题,分为7个不同类别:术后护理、骨骼健康、原位导管癌、诊断、营养与补充剂、补充替代医学以及治疗。配置了一个定制版的GPT - 4,设置了特定的系统提示,强调富有同理心、基于证据的回复,并与一个综合数据库集成,该数据库包含德国知名医学协会发布的指南、建议和患者信息资料。为评估聊天机器人的回复,我们使用了4个评估标准:可理解性(从患者角度的清晰度)、正确性(符合当前医学指南的准确性)、完整性(涵盖所有相关方面)以及潜在危害(对患者造成不当伤害或错误信息的风险)。由一位乳腺癌专家(正确性、完整性和潜在危害)和患者代表(可理解性)使用5点李克特量表进行评分。 结果:聊天机器人在多个维度上提供了高质量的回复。在评估可理解性的499条回复中,427条(85.6%)被评为可理解。在评估其余维度的104条回复中,91条(87.5%)被评为正确,72条(69.2%)被评为完整,93条(89.4%)被评为无危害。回复不完整的原因包括遗漏报销细节、近期治疗指南的更新,或关于内分泌治疗和术后护理计划的细微差别建议。此外,6条(5.8%)回复因过时或与上下文不适当的建议被评为可能有危害。尽管偶尔文档检索不完整,但聊天机器人在营养和骨骼健康类别中表现也良好。 结论:我们的研究结果表明,配备GPT - 4和检索增强功能的人工智能驱动聊天机器人能够有效地为德语区乳腺癌患者提供个性化、语言上易懂且基本准确的信息。这种方法在改善以患者为中心的沟通、使患者能够做出明智决策方面具有很大潜力。尽管如此,观察到的关于回复完整性和潜在危害的局限性凸显了持续人工监督的迫切需求。未来的研究与开发应优先考虑定期更新数据库、处理复杂文档结构的先进检索方法、多模态功能以及明确阐述强调专业医疗咨询必要性的免责声明。我们的评估以及提供的一组现实患者问题,为德语区肿瘤学聊天机器人的未来开发和验证建立了一个基准。
Health Technol Assess. 2006-9
Lancet Rheumatol. 2024-4
J Clin Oncol. 2024-5-10
JAMA Netw Open. 2023-10-2