Department of Urology, University of Kansas Medical Center, Kansas City, KS, USA.
Department of Urology, University of Florida College of Medicine, Gainesville, FL, USA.
World J Urol. 2024 Oct 29;42(1):600. doi: 10.1007/s00345-024-05326-1.
To evaluate and compare the performance of ChatGPT™ (Open AI) and Bing AI™ (Microsoft) for responding to kidney stone treatment-related questions in accordance with the American Urological Association (AUA) guidelines and assess factors such as appropriateness, emphasis on consulting healthcare providers, references, and adherence to guidelines by each chatbot.
We developed 20 kidney stone evaluation and treatment-related questions based on the AUA Surgical Management of Stones guideline. Questions were asked to ChatGPT and Bing AI chatbots. We compared their responses utilizing the brief DISCERN tool as well as response appropriateness.
ChatGPT significantly outperformed Bing AI for questions 1-3, which evaluate clarity, achievement, and relevance of responses (12.77 ± 1.71 vs. 10.17 ± 3.27; p < 0.01). In contrast, Bing AI always incorporated references, whereas ChatGPT never did. Consequently, the results for questions 4-6, which evaluated the quality of sources, were consistently favored Bing AI over ChatGPT (10.8 vs. 4.28; p < 0.01). Notably, neither chatbot offered guidance against guidelines for pre-operative testing. However, recommendations against guidelines were notable for specific scenarios: 30.5% for the treatment of adults with ureteral stones, 52.5% for adults with renal stones, and 20.5% for all patient treatment.
ChatGPT significantly outperformed Bing AI in terms of providing responses with clear aim, achieving such aim, and relevant and appropriate responses based on AUA surgical stone management guidelines. However, Bing AI provides references, allowing information quality assessment. Additional studies are needed to further evaluate these chatbots and their potential use by clinicians and patients for urologic healthcare-related questions.
根据美国泌尿外科学会 (AUA) 指南,评估和比较 ChatGPT™(Open AI)和 Bing AI™(微软)在回答肾结石治疗相关问题方面的表现,并评估每个聊天机器人的适当性、强调咨询医疗保健提供者、参考资料和遵守指南的情况。
我们根据 AUA 结石手术管理指南制定了 20 个肾结石评估和治疗相关问题。向 ChatGPT 和 Bing AI 聊天机器人提出问题。我们使用简要 DISCERN 工具以及响应适当性来比较它们的响应。
ChatGPT 在评估回答清晰度、实现和相关性的问题 1-3 方面明显优于 Bing AI(12.77±1.71 对 10.17±3.27;p<0.01)。相比之下,Bing AI 始终包含参考资料,而 ChatGPT 从不包含。因此,在评估资源质量的问题 4-6 中,结果始终有利于 Bing AI 而不利于 ChatGPT(10.8 对 4.28;p<0.01)。值得注意的是,两个聊天机器人都没有提供关于术前测试指南的指导。然而,对于特定情况,建议不遵守指南是值得注意的:成人输尿管结石治疗的 30.5%,成人肾结石治疗的 52.5%,以及所有患者治疗的 20.5%。
根据 AUA 手术结石管理指南,ChatGPT 在提供具有明确目标、实现目标以及相关和适当的回答方面明显优于 Bing AI。然而,Bing AI 提供了参考资料,允许对信息质量进行评估。需要进一步研究来进一步评估这些聊天机器人及其在泌尿科医疗保健相关问题方面对临床医生和患者的潜在用途。