Suppr超能文献

ChatGPT与DeepSeek在前列腺切除术后尿失禁管理中的表现。

Performance of ChatGPT and DeepSeek in the Management of Postprostatectomy Uri-nary Incontinence.

作者信息

Pinto Vicktor Bruno Pereira, Ataídes Romullo José Costa, do Nascimento Lucas Antônio Pereira, Gaspar Cristiane de Barros, Alves Priscila Ferreira, Pereira Magnum Adriel Santos, José de Macedo Manoel, Nahas Willian Carlos, de Bessa José, Gomes Cristiano Mendes

机构信息

Divisão de Urologia, Faculdade de Medicina da Universidade de São Paulo - FMUSP, São Paulo, Brasil.

Centro Universitário UniDombosco - UNDB, São Luis, MA, Brasil.

出版信息

Int Braz J Urol. 2025 Nov-Dec;51(6). doi: 10.1590/S1677-5538.IBJU.2025.0325.

Abstract

PURPOSE

Artificial intelligence (AI) continues to evolve as a tool in clinical decision support. Large language models (LLMs), such as ChatGPT and DeepSeek, are increasingly used in medicine to provide fast, accessible information. This study aimed to compare the performance of ChatGPT and DeepSeek in generating recommendations for the management of postprostatectomy urinary incontinence (PPUI), based on the AUA/SUFU guideline.

MATERIALS AND METHODS

A total of 20 questions (10 conceptual and 10 case-based) were developed by three urologists with expertise in PPUI, following the AUA/SUFU guideline. Each question was submitted in English using zero-shot prompting to ChatGPT-4o and DeepSeek R1. Responses were limited to 200 words and graded independently as correct (1 point), partially correct (0.5), or incorrect (0). Total and domain-specific scores were compared.

RESULTS

ChatGPT achieved 19 out of 20 points (95.0%), while DeepSeek scored 14.5 (72.5%; p = 0.031). In conceptual questions, scores were 9.0 (ChatGPT) and 8.0 (DeepSeek; p = 0.50). In case-based scenarios, ChatGPT scored 10.0 versus 6.5 for DeepSeek (p = 0.08). ChatGPT outperformed DeepSeek across all guideline domains. DeepSeek made critical errors in the treatment domain, such as recommending a male sling for radiated patients.

CONCLUSION

ChatGPT demonstrated superior performance in providing guideline-based recommendations for PPUI. However, both models should be used under expert supervision, and future research is needed to optimize their safe integration into clinical workflows.

摘要

目的

人工智能(AI)作为临床决策支持工具不断发展。ChatGPT和百川等大语言模型(LLM)在医学领域的应用越来越广泛,用于提供快速、便捷的信息。本研究旨在根据美国泌尿外科学会(AUA)/美国女性泌尿外科学会(SUFU)指南,比较ChatGPT和百川在生成前列腺切除术后尿失禁(PPUI)管理建议方面的表现。

材料与方法

由三位在PPUI方面具有专业知识的泌尿科医生按照AUA/SUFU指南编制了总共20个问题(10个概念性问题和10个基于病例的问题)。每个问题使用零样本提示以英文提交给ChatGPT-4o和百川R1。回答限制在200字以内,并独立评定为正确(1分)、部分正确(0.5分)或错误(0分)。比较总分和特定领域得分。

结果

ChatGPT获得20分中的19分(95.0%),而百川得分为14.5分(72.5%;p = 0.031)。在概念性问题中,ChatGPT得分为9.0分,百川为8.0分(p = 0.50)。在基于病例的场景中,ChatGPT得分为10.0分,百川为6.5分(p = 0.08)。ChatGPT在所有指南领域的表现均优于百川。百川在治疗领域出现了严重错误,例如为接受过放疗的患者推荐男性吊带。

结论

ChatGPT在提供基于指南的PPUI建议方面表现出卓越性能。然而,两种模型都应在专家监督下使用,并且需要进一步研究以优化它们安全融入临床工作流程的方式。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验