Barbosa-Silva Jordana, Driusso Patricia, Ferreira Elizabeth A, de Abreu Raphael M
Women's Health Research Laboratory, Physical Therapy Department, Federal University of São Carlos, São Carlos, Brazil.
Department of Obstetrics and Gynecology, FMUSP School of Medicine, University of São Paulo, São Paulo, Brazil.
Neurourol Urodyn. 2025 Jan;44(1):153-164. doi: 10.1002/nau.25603. Epub 2024 Oct 10.
Artificial intelligence models are increasingly gaining popularity among patients and healthcare professionals. While it is impossible to restrict patient's access to different sources of information on the Internet, healthcare professional needs to be aware of the content-quality available across different platforms.
To investigate the accuracy and completeness of Chat Generative Pretrained Transformer (ChatGPT) in addressing frequently asked questions related to the management and treatment of female urinary incontinence (UI), compared to recommendations from guidelines.
This is a cross-sectional study. Two researchers developed 14 frequently asked questions related to UI. Then, they were inserted into the ChatGPT platform on September 16, 2023. The accuracy (scores from 1 to 5) and completeness (score from 1 to 3) of ChatGPT's answers were assessed individually by two experienced researchers in the Women's Health field, following the recommendations proposed by the guidelines for UI.
Most of the answers were classified as "more correct than incorrect" (n = 6), followed by "incorrect information than correct" (n = 3), "approximately equal correct and incorrect" (n = 2), "near all correct" (n = 2, and "correct" (n = 1). Regarding the appropriateness, most of the answers were classified as adequate, as they provided the minimum information expected to be classified as correct.
These results showed an inconsistency when evaluating the accuracy of answers generated by ChatGPT compared by scientific guidelines. Almost all the answers did not bring the complete content expected or reported in previous guidelines, which highlights to healthcare professionals and scientific community a concern about using artificial intelligence in patient counseling.
人工智能模型在患者和医疗保健专业人员中越来越受欢迎。虽然不可能限制患者从互联网上获取不同信息来源,但医疗保健专业人员需要了解不同平台上可用内容的质量。
与指南建议相比,研究聊天生成预训练变换器(ChatGPT)在回答有关女性尿失禁(UI)管理和治疗的常见问题时的准确性和完整性。
这是一项横断面研究。两名研究人员提出了14个与UI相关的常见问题。然后,于2023年9月16日将这些问题输入ChatGPT平台。由两名女性健康领域经验丰富的研究人员按照UI指南提出的建议,分别评估ChatGPT答案的准确性(1至5分)和完整性(1至3分)。
大多数答案被归类为“正确多于错误”(n = 6),其次是“错误信息多于正确”(n = 3)、“正确与错误大致相等”(n = 2)、“几乎全部正确”(n = 2)和“正确”(n = 1)。关于适当性,大多数答案被归类为足够,因为它们提供了预期被归类为正确所需的最低信息。
与科学指南相比,这些结果显示在评估ChatGPT生成答案的准确性时存在不一致。几乎所有答案都没有包含先前指南中预期或报告的完整内容,这向医疗保健专业人员和科学界凸显了在患者咨询中使用人工智能的一个问题。