Reicher Lee, Lutsker Guy, Michaan Nadav, Grisaru Dan, Laskov Ido
Department of Gynecologic Oncology, Lis Hospital for Women, Tel Aviv Medical Center, Tel Aviv, Israel.
Sackler School of Medicine, Department of Gynecology, Tel Aviv University, Tel Aviv, Israel.
Int J Gynaecol Obstet. 2025 Feb;168(2):419-427. doi: 10.1002/ijgo.15869. Epub 2024 Aug 20.
Gynecologic cancer requires personalized care to improve outcomes. Large language models (LLMs) hold the potential to provide intelligent question-answering with reliable information about medical queries in clear and plain English, which can be understood by both healthcare providers and patients. We aimed to evaluate two freely available LLMs (ChatGPT and Google's Bard) in answering questions regarding the management of gynecologic cancer. The LLMs' performances were evaluated by developing a set questions that addressed common gynecologic oncologic findings from a patient's perspective and more complex questions to elicit recommendations from a clinician's perspective. Each question was presented to the LLM interface, and the responses generated by the artificial intelligence (AI) model were recorded. The responses were assessed based on the adherence to the National Comprehensive Cancer Network and European Society of Gynecological Oncology guidelines. This evaluation aimed to determine the accuracy and appropriateness of the information provided by LLMs. We showed that the models provided largely appropriate responses to questions regarding common cervical cancer screening tests and BRCA-related questions. Less useful answers were received to complex and controversial gynecologic oncology cases, as assessed by reviewing the common guidelines. ChatGPT and Bard lacked knowledge of regional guideline variations, However, it provided practical and multifaceted advice to patients and caregivers regarding the next steps of management and follow up. We conclude that LLMs may have a role as an adjunct informational tool to improve outcomes.
妇科癌症需要个性化护理以改善治疗效果。大语言模型有潜力以清晰易懂的英语提供关于医疗问题的可靠信息的智能问答,医疗服务提供者和患者都能理解。我们旨在评估两个免费的大语言模型(ChatGPT和谷歌的Bard)在回答有关妇科癌症管理问题方面的表现。通过设计一系列问题来评估大语言模型的性能,这些问题从患者角度涉及常见的妇科肿瘤学发现,以及从临床医生角度提出的更复杂问题以引出建议。每个问题都呈现给大语言模型界面,并记录人工智能(AI)模型生成的回答。根据是否符合美国国立综合癌症网络和欧洲妇科肿瘤学会指南来评估这些回答。该评估旨在确定大语言模型提供信息的准确性和适当性。我们发现,这些模型对有关常见宫颈癌筛查测试和与BRCA相关问题的回答在很大程度上是恰当的。通过审查通用指南评估,对于复杂和有争议的妇科肿瘤病例,得到的有用答案较少。ChatGPT和Bard缺乏对地区指南差异的了解,然而,它为患者和护理人员提供了关于下一步管理和随访的实用且多方面的建议。我们得出结论,大语言模型可能作为辅助信息工具发挥作用以改善治疗效果。