From the Division of Urogynecology and Reconstructive Pelvic Surgery, Department of Obstetrics and Gynecology.
Department of Urology, University of Iowa Hospitals and Clinics, Iowa City, IA.
Urogynecology (Phila). 2024 Mar 1;30(3):245-250. doi: 10.1097/SPV.0000000000001459.
Large language models are artificial intelligence applications that can comprehend and produce human-like text and language. ChatGPT is one such model. Recent advances have increased interest in the utility of large language models in medicine. Urogynecology counseling is complex and time-consuming. Therefore, we evaluated ChatGPT as a potential adjunct for patient counseling.
Our primary objective was to compare the accuracy and completeness of ChatGPT responses to information in standard patient counseling leaflets regarding common urogynecological procedures.
Seven urogynecologists compared the accuracy and completeness of ChatGPT responses to standard patient leaflets using 5-point Likert scales with a score of 3 being "equally accurate" and "equally complete," and a score of 5 being "much more accurate" and much more complete, respectively. This was repeated 3 months later to evaluate the consistency of ChatGPT. Additional analysis of the understandability and actionability was completed by 2 authors using the Patient Education Materials Assessment Tool. Analysis was primarily descriptive. First and second ChatGPT queries were compared with the Wilcoxon signed rank test.
The median (interquartile range) accuracy was 3 (2-3) and completeness 3 (2-4) for the first ChatGPT query and 3 (3-3) and 4 (3-4), respectively, for the second query. Accuracy and completeness were significantly higher in the second query (P < 0.01). Understandability and actionability of ChatGPT responses were lower than the standard leaflets.
ChatGPT is similarly accurate and complete when compared with standard patient information leaflets for common urogynecological procedures. Large language models may be a helpful adjunct to direct patient-provider counseling. Further research to determine the efficacy and patient satisfaction of ChatGPT for patient counseling is needed.
大型语言模型是一种人工智能应用程序,能够理解和生成类似人类的文本和语言。ChatGPT 就是这样的一个模型。最近的进展增加了人们对大型语言模型在医学中的应用效用的兴趣。尿失禁咨询复杂且耗时。因此,我们评估了 ChatGPT 是否可以作为患者咨询的辅助工具。
我们的主要目的是比较 ChatGPT 对常见尿失禁程序标准患者咨询传单中信息的准确性和完整性,与标准患者传单相比。
7 名尿失禁专家使用 5 分李克特量表对 ChatGPT 的准确性和完整性进行了比较,评分 3 表示“准确性相当”和“完整性相当”,评分 5 表示“准确性更高”和“完整性更高”。3 个月后,重复了这一过程,以评估 ChatGPT 的一致性。两名作者使用患者教育材料评估工具完成了对可理解性和可操作性的额外分析。分析主要是描述性的。首先和第二个 ChatGPT 查询与 Wilcoxon 符号秩检验进行了比较。
第一个 ChatGPT 查询的中位数(四分位距)准确性为 3(2-3),完整性为 3(2-4),第二个查询分别为 3(3-3)和 4(3-4)。第二次查询的准确性和完整性显著更高(P<0.01)。ChatGPT 响应的可理解性和可操作性低于标准传单。
与常见尿失禁程序的标准患者信息传单相比,ChatGPT 的准确性和完整性相当。大型语言模型可能是直接医患咨询的有用辅助工具。需要进一步研究以确定 ChatGPT 对患者咨询的疗效和患者满意度。